Bug 1248 - oops with kernel 2.6.20
: oops with kernel 2.6.20
Status: VERIFIED WONTFIX
: IPW3945
__UNSPECIFIED__
: 1.2.0
: Dell Fedora Core
: P2 normal
Assigned To:
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2007-03-31 06:00 by
Modified: 2008-12-08 22:14 (History)


Attachments
a patch to try (472 bytes, patch)
2007-04-02 22:53, Zhu Yi
Details | Diff
kernel messages (22.06 KB, text/plain)
2007-04-04 05:17, Brian Millett
Details
2nd try (2.20 KB, patch)
2007-04-08 19:45, Zhu Yi
Details | Diff
3nd try (3.44 KB, patch)
2007-04-10 23:52, Zhu Yi
Details | Diff
oops for patch #2 (5.04 KB, text/plain)
2007-04-11 05:21, Brian Millett
Details
kernel BUG with "3rd-try" patch (12.18 KB, text/plain)
2007-04-14 03:49, Mario Pascucci
Details
4th try (4.35 KB, patch)
2007-04-16 01:00, Zhu Yi
Details | Diff
syslog of firmware error with "4-try" patch (3.80 KB, text/plain)
2007-04-18 14:10, Mario Pascucci
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2007-03-31 06:00:21
I am using the ipw3945 packages from atrpms.  These are the versions:

ipw3945-1.2.0-18.2.fc6.at
ipw3945d-1.7.22-4.at
ipw3945-kmdl-2.6.19-1.2911.6.4.fc6-1.2.0-18.2.fc6.at
ipw3945-kmdl-2.6.19-1.2911.6.5.fc6-1.2.0-18.2.fc6.at
ipw3945-kmdl-2.6.19-1.2911.fc6-1.2.0-18.2.fc6.at
ipw3945-kmdl-2.6.20-1.2925.fc6-1.2.0-18.2.fc6.at
ipw3945-kmdl-2.6.20-1.2933.fc6-1.2.0-18.2.fc6.at
ipw3945-ucode-1.14.2-4.at

ONLY with the 2.6.20 kernel do I get randomly an oops.  This is the latest:

 ipw3945: Microcode SW error detected.  Restarting.
 ipw3945: request scan called when driver not ready.
Mar 30 21:44:13 dufus NetworkManager: <WARNING> 
nm_device_802_11_wireless_get_essid (): error getting ESSID for device eth1:
Resource temporarily unavailable
Mar 30 21:44:15 dufus NetworkManager: <WARNING> 
nm_device_802_11_wireless_get_essid (): error getting ESSID for device eth1:
Resource temporarily unavailable
 ipw3945: Error sending ADD_STA: time out after 500ms.
 invalid opcode: 0000 [#1]
 SMP
 last sysfs file: /class/net/eth0/carrier
 Modules linked in: arc4 ecb blkcipher ieee80211_crypt_wep rfcomm hidp l2cap
bluetooth ohci1394 ieee1394 button usb_storage aes ieee80211_crypt_ccmp vfat fat
ipt_LOG xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink xt_multiport
iptable_filter ip_tables x_tables nfsd exportfs lockd nfs_acl autofs4
vmnet(P)(U) vmmon(P)(U) sunrpc cpufreq_ondemand video sbs i2c_ec dock battery
asus_acpi backlight ac parport_pc lp parport snd_hda_intel snd_hda_codec
snd_seq_dummy ipw3945(F)(U) snd_seq_oss snd_seq_midi_event snd_seq ieee80211
snd_seq_device joydev iTCO_wdt ieee80211_crypt nvidia(P)(U) snd_pcm_oss
snd_mixer_oss iTCO_vendor_support snd_pcm sr_mod snd_timer cdrom serio_raw snd
tg3 ide_cs pcspkr sg soundcore i2c_i801 snd_page_alloc i2c_core dm_snapshot
dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
 CPU:    0
 EIP:    0060:[<c0434847>]    Tainted: PF     VLI
 EFLAGS: 00010206   (2.6.20-1.2933.fc6 #1)
 EIP is at run_workqueue+0x8a/0x125
 eax: f73fcd4c   ebx: f73fe258   ecx: 00000018   edx: 00000018
 esi: f7782e40   edi: 00000246   ebp: f73fe254   esp: f7d2cf6c
 ds: 007b   es: 007b   ss: 0068
 Process ipw3945/0 (pid: 1581, ti=f7d2c000 task=f71ccbf0 task.ti=f7d2c000)
 Stack: 00000000 00000282 f70ccd34 c061ffaf f903b9d4 f7782e40 f70ccd34 f7d2cfbc
        00000000 c04351b4 00000001 00000000 00000000 00010000 00000000 00000000
        f71ccbf0 c04226ab 00100100 00200200 ffffffff ffffffff f7782e40 c04350bb
 Call Trace:
  [<c061ffaf>] _spin_lock_irqsave+0x9/0xd
  [<f903b9d4>] ipw_bg_disassociate+0x0/0x31 [ipw3945]
  [<c04351b4>] worker_thread+0xf9/0x124
  [<c04226ab>] default_wake_function+0x0/0xc
Mar 30 21:44:17 dufus NetworkManager: <WARNING> 
nm_device_802_11_wireless_get_essid (): error getting ESSID for device eth1:
Resource temporarily unavailable
  [<c04350bb>] worker_thread+0x0/0x124
Mar 30 21:44:20 dufus NetworkManager: <WARNING> 
nm_device_802_11_wireless_get_essid (): error getting ESSID for device eth1:
Resource temporarily unavailable
  [<c04377c7>] kthread+0xb0/0xd9
  [<c0437717>] kthread+0x0/0xd9
  [<c0404b33>] kernel_thread_helper+0x7/0x10
  =======================
 Code: e8 9f b7 1e 00 8b 43 fc 83 e0 fc 39 f0 74 04 0f 0b eb fe 8b 43 fc a8 02
75 06 f0 0f ba 73 fc 00 89 e8 ff 54 24 10 89 e0 25 00 f0 <ff> ff 8b 48 14 f7 c1
ff ff ff ef 74 48 65 a1 08 00 00 00 8b 90
------- Comment #1 From 2007-04-02 22:51:05 -------
Can you please try below patch for ipw3945? After apply the patch, recompile and
reinstall, `modprobe ipw3945 debug=0x43fff`. Please report back with dmesg if
you still see the oops.
------- Comment #2 From 2007-04-02 22:53:11 -------
Created an attachment (id=1020) [details]
a patch to try
------- Comment #3 From 2007-04-04 05:17:19 -------
Created an attachment (id=1021) [details]
kernel messages

Ok, got another last night.  I ran dmesg as advised, but all I got was:
 
ated.
ipw3945: I ipw_net_hard_start_xmit Tx attempt while not associated.
(the last line repeated 1846 times)

I'll upload the messages from /var/log/messages about the oops.

thanks.
------- Comment #4 From 2007-04-05 02:25:07 -------
Thanks for the testing. Can you tell me under what condition did you get the
oops? During normal transfer or driver unload time?
------- Comment #5 From 2007-04-05 04:01:58 -------
Not really sure.  I've gotten it at night while the laptop is just sitting,
while I was using it reading email, and once while I was surfing.  It has NOT
happened while I was doing heavy file transfers, compiles, or such.  Last night,
nothing happened at all so it seems to be random, or at least from the user
perspective.  

I'm connecting to a linksys rev 6 wrt54g using WEP personal where the group key
renewal is 600 sec.

Thank you.

I think I mentioned that this just started with the FC6 2.6.20 kernels and the
2.6.19 never had a problem.
------- Comment #6 From 2007-04-07 16:12:35 -------
Hello,
I am also seeing this problem occasionally (no special conditions)
with the  2.6.20-1.2933.fc6 kernel and the drivers from atrpms:

ipw3945d-1.7.22-4.at
ipw3945-ucode-1.14.2-4.at
ipw3945-kmdl-2.6.20-1.2933.fc6-1.2.0-18.2.fc6.at
ieee80211-kmdl-2.6.20-1.2933.fc6-1.2.16-17.fc6.at

Here are the messages; they are similar to those in the earlier reported
case.
Apr  7 15:43:56 localhost kernel: ipw3945: Error sending cmd #07 to daemon: time
out after 500ms.
Apr  7 15:43:58 localhost kernel: ipw3945: Error sending SCAN_ABORT_CMD: time
out after 500ms.
Apr  7 15:43:58 localhost kernel: ipw3945: Error sending cmd #08 to daemon: time
out after 500ms.
Apr  7 15:43:59 localhost kernel: ipw3945: Error sending ADD_STA: time out after
500ms.
Apr  7 15:43:59 localhost kernel: invalid opcode: 0000 [#1]
Apr  7 15:43:59 localhost kernel: SMP 
Apr  7 15:43:59 localhost kernel: last sysfs file:
/devices/pci0000:00/0000:00:1c.1/0000:0c:00.0/cmd
Apr  7 15:43:59 localhost kernel: Modules linked in: vfat fat usb_storage aes
ieee80211_crypt_ccmp(F)(U) ipw3945(F)(U) ieee80211(F)(U) ieee80211_crypt(F)(U)
autofs4 hidp rfc
omm l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4
xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp
ip6table_filte
r ip6_tables x_tables cpufreq_ondemand video sbs i2c_ec dock button battery
asus_acpi backlight ac ipv6 parport_pc lp parport joydev snd_hda_intel
snd_hda_codec snd_seq_dumm
y snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device sr_mod nvidia(PF)(U)
snd_pcm_oss snd_mixer_oss tg3 i2c_i801 cdrom serio_raw ohci1394 sdhci iTCO_wdt
pcspkr snd_pcm i2
c_core mmc_core ieee1394 iTCO_vendor_support snd_timer sg snd soundcore
snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod
scsi_mod ext3 jbd ehci_hcd
 ohci_hcd uhci_hcd
Apr  7 15:43:59 localhost kernel: CPU:    0
Apr  7 15:43:59 localhost kernel: EIP:    0060:[<c0434847>]    Tainted: PF     VLI
Apr  7 15:43:59 localhost kernel: EFLAGS: 00010216   (2.6.20-1.2933.fc6 #1)
Apr  7 15:43:59 localhost kernel: EIP is at run_workqueue+0x8a/0x125
Apr  7 15:43:59 localhost kernel: eax: f4954d4c   ebx: f4956258   ecx: 00000018
  edx: 00000018
Apr  7 15:43:59 localhost kernel: esi: f694ebc0   edi: 00000246   ebp: f4956254
  esp: f4fedf6c
Apr  7 15:43:59 localhost kernel: ds: 007b   es: 007b   ss: 0068
Apr  7 15:43:59 localhost kernel: Process ipw3945/0 (pid: 3392, ti=f4fed000
task=f6d83470 task.ti=f4fed000)
Apr  7 15:43:59 localhost kernel: Stack: 00000000 00000282 f4918d34 c061ffaf
f8ad19d4 f694ebc0 f4918d34 f4fedfbc 
Apr  7 15:43:59 localhost kernel:        00000000 c04351b4 00000001 00000000
00000000 00010000 00000000 00000000 
Apr  7 15:43:59 localhost kernel:        f6d83470 c04226ab 00100100 00200200
ffffffff ffffffff f694ebc0 c04350bb 
Apr  7 15:43:59 localhost kernel: Call Trace:
Apr  7 15:43:59 localhost kernel:  [<c061ffaf>] _spin_lock_irqsave+0x9/0xd
Apr  7 15:43:59 localhost kernel:  [<f8ad19d4>] ipw_bg_disassociate+0x0/0x31
[ipw3945]
Apr  7 15:43:59 localhost kernel:  [<c04351b4>] worker_thread+0xf9/0x124
Apr  7 15:43:59 localhost kernel:  [<c04226ab>] default_wake_function+0x0/0xc
Apr  7 15:43:59 localhost kernel:  [<c04350bb>] worker_thread+0x0/0x124
Apr  7 15:43:59 localhost kernel:  [<c04377c7>] kthread+0xb0/0xd9
Apr  7 15:43:59 localhost kernel:  [<c0437717>] kthread+0x0/0xd9
Apr  7 15:43:59 localhost kernel:  [<c0404b32>] kernel_thread_helper+0x6/0x10
Apr  7 15:43:59 localhost kernel:  =======================
Apr  7 15:43:59 localhost kernel: Code: e8 9f b7 1e 00 8b 43 fc 83 e0 fc 39 f0
74 04 0f 0b eb fe 8b 43 fc a8 02 75 06 f0 0f ba 73 fc 00 89 e8 ff 54 24 10 89 e0
25 00 f0 <ff>
 ff 8b 48 14 f7 c1 ff ff ff ef 74 48 65 a1 08 00 00 00 8b 90 
Apr  7 15:43:59 localhost kernel: EIP: [<c0434847>] run_workqueue+0x8a/0x125
SS:ESP 0068:f4fedf6c

I'm running on a Dell XPS M1710.
------- Comment #7 From 2007-04-08 19:45:06 -------
Created an attachment (id=1023) [details]
2nd try

Here is another patch to try. Please attach dmesg with debug=0x6bfff if you see
the oops again.

BTW, Dan Krejsa, did you see firmware error before the oops?
------- Comment #8 From 2007-04-08 22:33:18 -------
*** Bug 1256 has been marked as a duplicate of this bug. ***
------- Comment #9 From 2007-04-10 23:52:27 -------
Created an attachment (id=1025) [details]
3nd try

The patch workarounds a stack overwritten bug. Please help to test if it fixed
the oops.
------- Comment #10 From 2007-04-11 05:21:17 -------
Created an attachment (id=1026) [details]
oops for patch #2

I had an oops last night, logged in to see "try #3", so I will.
This is an oops for "try #2"

This is what was on the terminal (minus the syslog part)
dufus kernel: Oops: 0002 [#1]
dufus kernel: SMP 
dufus kernel: CPU:    0
dufus kernel: EIP:    0060:[<f903ba85>]    Tainted: PF	   VLI
dufus kernel: EFLAGS: 00010286	 (2.6.20-1.2933.fc6 #1)
dufus kernel: EIP is at ipw_bg_disassociate+0x46/0x55 [ipw3945]
dufus kernel: eax: 00000001   ebx: f6c2cd4c   ecx: 00000018   edx: 00000246
dufus kernel: esi: f6c2cbd0   edi: 00000246   ebp: f6c2e254   esp: f773bf5c
dufus kernel: ds: 007b	 es: 007b   ss: 0068
dufus kernel: Process ipw3945/0 (pid: 1511, ti=f773b000 task=f701d430
task.ti=f773b000)
dufus kernel: Stack: f9053869 f6c2e258 f7375c40 c0434842 00000000 00000282
f7fc5d34 c061ffaf 
dufus kernel:	     f903ba3f f7375c40 f7fc5d34 f773bfbc 00000000 c04351b4
00000001 00000000 
dufus kernel:	     00000001 00010000 00000000 00000000 f701d430 c04226ab
00100100 00200200 
dufus kernel: Call Trace:
dufus kernel:  [<c0434842>] run_workqueue+0x85/0x125
dufus kernel:  [<c061ffaf>] _spin_lock_irqsave+0x9/0xd
dufus kernel:  [<f903ba3f>] ipw_bg_disassociate+0x0/0x55 [ipw3945]
dufus kernel:  [<c04351b4>] worker_thread+0xf9/0x124
dufus kernel:  [<c04226ab>] default_wake_function+0x0/0xc
dufus kernel:  [<c04350bb>] worker_thread+0x0/0x124
dufus kernel:  [<c04377c7>] kthread+0xb0/0xd9
dufus kernel:  [<c0437717>] kthread+0x0/0xd9
dufus kernel:  [<c0404b33>] kernel_thread_helper+0x7/0x10
dufus kernel:  =======================
dufus kernel: Code: 05 f9 81 eb 08 15 00 00 e8 72 c1 3e c7 e8 83 9b 3c c7 c7 04
24 69 38 05 f9 e8 61 c1 3e c7 89 d8 e8 42 38 5e c7 89 f0 e8 fa fe ff <ff> 89 d8
5b 5b 5e e9 e0 37 5e c7 59 5b 5e c3 55 57 56 89 c6 53 
dufus kernel: EIP: [<f903ba85>] ipw_bg_disassociate+0x46/0x55 [ipw3945] SS:ESP
0068:f773bf5c


I'll try the "try #3".

I'm patching the clean ipw3945.c from the ipw3945-linux-1.2.0.tgz.  Is that
correct?

Thanks.
------- Comment #11 From 2007-04-11 18:19:40 -------
(In reply to comment #10)
> I'll try the "try #3".

Thanks.
 
> I'm patching the clean ipw3945.c from the ipw3945-linux-1.2.0.tgz.  Is that
> correct?

Yes. It's appreciated if you can load the module with "modprobe ipw3945
debug=0x6bfff" and attach the full log (not only the oops but also the logs
related to ipw3945 before the oops) for trying the 3rd patch.
------- Comment #12 From 2007-04-14 03:49:08 -------
Created an attachment (id=1029) [details]
kernel BUG with "3rd-try" patch

Tested patching against 1.2.0, kernel 2.6.20-1.2933.fc6.
After OOPS the computer hangs.
------- Comment #13 From 2007-04-15 18:46:07 -------
(In reply to comment #7)
> Created an attachment (id=1023) [edit] [details]
> 2nd try
> 
> Here is another patch to try. Please attach dmesg with debug=0x6bfff if you see
> the oops again.
> 
> BTW, Dan Krejsa, did you see firmware error before the oops?

Hi, sorry for not checking back in a while.  No, I didn't see the firmware
error (no 'ipw3945: Microcode SW error detected').
------- Comment #14 From 2007-04-16 01:00:59 -------
Created an attachment (id=1031) [details]
4th try

I think I've found the root cause. In ipw_send_cmd(), the cmd->meta.u.skb is
actually the same as cmd->meta.u.source, because it is a union!! So we cannot
just free it if it is not NULL. Please try this patch and see if the oops
happens again.
------- Comment #15 From 2007-04-18 03:37:42 -------
Well, nothing so far, I've been using "4th try" and nothing yet.  I'll keep
testing.  Also, FC6 upgraded to kernel 2.6.20-1.2944.  But it looks good.
------- Comment #16 From 2007-04-18 14:10:17 -------
Created an attachment (id=1033) [details]
syslog of firmware error with "4-try" patch

With 4-try patch no more kernel bug, only firmware error (see the attachment).
I discovered that there is another access point (probably in the near house)
with strong signal, on a different channel end with WEP encryption. I noticed
that firmware errors shows more frequently when I go with notebook where the
signal from "foreign" AP is stronger. I don't know if it's related, but I think
you should know...
------- Comment #17 From 2008-12-08 21:50:24 -------
ipw3945 as a driver has been replaced by iwl3945 in official kernel for a long
time. We suggest to use iwl3945 driver instead of the obsolete ipw3945 driver.
If you have bug, please report it with product=iwlwifi and platform="Intel(R)
Wifi Link 3945". Thanks so much!