Bugzilla – Bug 1096
soft lockup while loading the driver
Last modified: 2008-12-08 22:08:20
You need to log in before you can comment on or make changes to this bug.
This bug comes from bug #1089 soft lockup found while loading the driver (when calibrating the card).
Could the bug be related to what I am seeing under 2.6.17-1.2145_FC5smp? Jul 9 11:32:26 tako kernel: BUG: soft lockup detected on CPU#0! Jul 9 11:32:26 tako kernel: <c044a9b6> softlockup_tick+0xad/0xc4 <c042d874> update_process_times+0x39/0x5c Jul 9 11:32:26 tako kernel: <c0418af3> smp_apic_timer_interrupt+0x5a/0x63 <c040490f> apic_timer_interrupt+0x1f/0x24 Jul 9 11:32:26 tako kernel: <c0411f23> delay_pmtmr+0xb/0x13 <c04e9999> __delay+0x9/0xa Jul 9 11:32:26 tako kernel: <f8a474d4> ipw_bg_alive_start+0xc0/0x1a1 [ipw3945] <c043317e> run_workqueue+0x86/0xc6 Jul 9 11:32:26 tako kernel: <c0436650> remove_wait_queue+0xaf/0xb9 <f8a47414> ipw_bg_alive_start+0x0/0x1a1 [ipw3945] Jul 9 11:32:26 tako kernel: <c0433ade> worker_thread+0x0/0x106 <c0433bb3> worker_thread+0xd5/0x106 Jul 9 11:32:26 tako kernel: <c041f2ff> default_wake_function+0x0/0xc <c04362e3> kthread+0x9d/0xc9 Jul 9 11:32:26 tako kernel: <c0436246> kthread+0x0/0xc9 <c0402005> kernel_thread_helper+0x5/0xb
(In reply to comment #6) > Jul 9 11:32:26 tako kernel: BUG: soft lockup detected on CPU#0! > Jul 9 11:32:26 tako kernel: <c044a9b6> softlockup_tick+0xad/0xc4 <c042d874> .... > Jul 9 11:32:26 tako kernel: <c0411f23> delay_pmtmr+0xb/0x13 <c04e9999> > __delay+0x9/0xa > Jul 9 11:32:26 tako kernel: <f8a474d4> ipw_bg_alive_start+0xc0/0x1a1 [ipw3945] The only possible problematic area is we use udelay+busy loop , waiting for the thermal sensor to kick in. Please use debug=0x4 when loading the driver, and attach the debug log.
Hmm.. For me (Ubuntu Dapper, ipw3945-1.1.0-pre2) modprobe ipw3945 debug=0x4 doesn't seem to output anything in dmesg. Also I do not get this softlockup so my inputs may not be helpful here.
Since the reporter can not find this problem again, mark it as fixed, if anyone find this bug again, please reopen.
I have this lockup too: BUG: soft lockup detected on CPU#0! [<c014185c>] softlockup_tick+0x9c/0xe0 [<c0126721>] update_process_times+0x31/0x80 [<c010f2da>] smp_apic_timer_interrupt+0x5a/0x60 [<c0103a23>] apic_timer_interrupt+0x1f/0x24 [<c01d007b>] alloc_disc_node+0x7b/0xd0 [<f8a9b213>] ipw_bg_alive_start+0x73/0xa0 [ipw3945] [<c012d11b>] run_workqueue+0x7b/0xf0 [<f8a9b1a0>] ipw_bg_alive_start+0x0/0xa0 [ipw3945] [<c012d8e7>] worker_thread+0x117/0x140 [<c0116700>] default_wake_function+0x0/0x10 [<c012d7d0>] worker_thread+0x0/0x140 [<c01303c7>] kthread+0xf7/0x100 [<c01302d0>] kthread+0x0/0x100 [<c0100e15>] kernel_thread_helper+0x5/0x10 It's a debian 2.6.18 kernel with SMP. The lockup occurs a second or two after loading the module ipw3945. ipw3945 (Version 1.1.0) doesn't have a debug option.
This occurs not until the ipw3945d daemon is loaded and only if the kill switch is on.
Alexander, Can you load the driver with debug=0x43fff and attach the dmesg when it happens softlock? Thanks!
This is what I copied from the screen: ipw3945: Intel(R) PRO/Wireless 3945 Network Connection driver for Linux, 1.1.0dmpr ipw3945: Copyright(c) 2003-2006 Intel Corporation ipw3945: U ipw_pci_probe pci_resource_len = 0x00001000 ipw3945: U ipw_pci_probe pci_resource_base = f8832000 ipw3945: U ipw_pci_probe Auto associate disabled. ipw3945: Detected Intel PRO/Wireless 3945ABG Network Connection ipw3945: U ipw_get_fw Loading firmware 'ipw3945.ucode' file (111572 bytes) ipw3945: U ipw_pci_probe Waiting for ipw3945d to request INIT. [...loading ipw3945d...] ipw3945: U ipw_handle_daemon_set_state UNINIT state requested by daemon. ipw3945: U ipw_clear_free_frames 0 frames on pre-allocated heap on clear. ipw3945: U ipw_handle_daemon_set_state INIT state requested by daemon. ipw3945: U ipw_power_init_handle Intialize power ipw3945: U ipw_power_init_handle adjust power command flags ipw3945: U ipw_nic_init HW Revision ID = 0x2 ipw3945: U ipw_nic_init ALM-MM type ipw3945: U ipw_nic_init SKU OP mode is basic ipw3945: U ipw_nic_init 3945ABG revision is 0xF1 ipw3945: U ipw_nic_init Card M type B version is 0x2 ipw3945: U ipw_download_ucode 3945ABG card ucode download is good ipw3945: U ipw_download_ucode 3945ABG card ucode download is good ipw3945: U ipw_verify_ucode ucode image is good ipw3945: U ipw_card_show_info 3945ABG HW Version 0.0.241 ipw3945: I ipw_rx_handle Alive ucode status 0x00000001 revision 0x1 0x0 ipw3945: U ipw_card_show_info 3945ABG PBA Number D26972003 ipw3945: U ipw_card_show_info eeprom value at byte 0x94 is 0x02 ipw3945: U ipw_card_show_info EEPROM_ANTENNA_SWITCH_TYPE is 0x01 ipw3945: U ipw_up MAC address: 00:13:02:XX:XX:XX ipw3945: U ipw_alive_start Alive received. BUG: soft lockup detected on CPU#0! [<c014186c>] softlockup_tick+0x9c/0xe0 [<c0126731>] update_process_times+0x31/0x80 [<c010f2ea>] smp_apic_timer_interrupt+0x5a/0x60 [<c0103a23>] apic_timer_interrupt+0x1f/0x24 [<c01dd414>] delay_tsc+0x10/0x20 [<c01dd456>] __delay+0x6/0x10 [<f8a524bb>] ipw_bg_alive_start+0xfb/0x1e0 [ipw3945] [<c012d12b>] run_workqueue+0x7b/0xf0 [<f8a523c0>] ipw_bg_alive_start+0x0/0x1e0 [ipw3945] [<c012d8f7>] worker_thread+0x117/0x140 [<c0116710>] default_wake_function+0x0/0x10 [<c012d7e0>] worker_thread+0x0/0x140 [<c01303d7>] kthread+0xf7/0x100 [<c01302e0>] kthread+0x0/0x100 [<c0100e15>] kernel_thread_helper+0x5/0x10
Created an attachment (id=947) [details] patch to try
It seems that when RF kill is on we will be busy loop waiting for the thermal sensor forever (in ipw_alive_start). The patch exits from the busy waiting for a certain amount of time. Thanks, Hong
This bug is also being reported over at Ubuntu's bug tracker: https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.17/+bug/64125 It looks like you've come further here over - hopefully we can help fix this. Duncan
Testing of this patch is also happening over here: https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.17/+bug/63418 Seems to work fine so far. Is this patch just a temporary workaround, or a final solution?
Created an attachment (id=956) [details] new patch to try
change status
Merged for 1.1.3
I still have this lockup. I compiled a vanilla 2.6.19 kernel with ipw3945 1.1.2 and the patch from 2006-11-12. The ipw3945 messages are the same, but the stacktrace is slightly different: Starting ipw3945d daemon. ipw3945: U ipw_handle_daemon_set_state UNINIT state requested by daemon. ipw3945: U ipw_clear_free_frames 0 frames on pre-allocated heap on clear. ipw3945: U ipw_handle_daemon_set_state INIT state requested by daemon. ipw3945: U ipw_power_init_handle Intialize power ipw3945: U ipw_power_init_handle adjust power command flags ipw3945: U ipw_nic_init HW Revision ID = 0x2 ipw3945: U ipw_nic_init ALM-MM type ipw3945: U ipw_nic_init SKU OP mode is basic ipw3945: U ipw_nic_init 3945ABG revision is 0xF1 ipw3945: U ipw_nic_init Card M type B version is 0x2 ipw3945: U ipw_download_ucode 3945ABG card ucode download is good ipw3945: U ipw_download_ucode 3945ABG card ucode download is good ipw3945: U ipw_verify_ucode ucode image is good ipw3945: U ipw_card_show_info 3945ABG HW Version 0.0.241 ipw3945: I ipw_rx_handle Alive ucode status 0x00000001 revision 0x1 0x0 ipw3945: U ipw_card_show_info 3945ABG PBA Number D26972003 ipw3945: U ipw_card_show_info eeprom value at byte 0x94 is 0x02 ipw3945: U ipw_card_show_info EEPROM_ANTENNA_SWITCH_TYPE is 0x01 ipw3945: U ipw_up MAC address: 00:13:02:XX:XX:XX ipw3945: U ipw_alive_start Alive received. BUG: soft lockup detected on CPU#0! [<c014f86c>] softlockup_tick+0x9c/0xe0 [<c012f4b1>] update_process_times+0x31/0x80 [<c0115e40>] smp_apic_timer_interrupt+0x90/0xb0 [<c0103bf3>] apic_timer_interrupt+0x1f/0x24 [<c01f007b>] blk_trace_ioctl+0x2cb/0x370 [<f8ad3450>] ipw_bg_alive_start+0x19b/0x200 [ipw3945] [<c01361eb>] run_workqueue+0x7b/0xf0 [<f8ad3450>] ipw_bg_alive_start+0x0/0x200 [ipw3945] [<c0136de7>] worker_thread+0x117/0x140 [<c011eb80>] default_wake_function+0x0/0x10 [<c0136cd0>] worker_thread+0x0/0x140 [<c0139b07>] kthread+0xf7/0x100 [<c0139a10>] kthread+0x0/0x100 [<c0103d43>] kernel_thread_helper+0x7/0x14 It still happens only, if the kill switch is on.
Created an attachment (id=959) [details] read-rfkill-register patch It seems the driver didn't receive the CARD_STATE_NOTIFICATION. The new patch reads the RFKILL status register directly. Would you please have a try? If it still fails, please attach the dmesg. Thanks, Hong
I will try it. I just recompiled it with some additional debug messages, just to confirm, that it really is the loop waiting for the thermal sensor. So I set the debug flags to 0x63fff to see, whether a card state notification does occur and it didn't. When the kill switch is off, this notification also comes later (after calibration and channel scanning). I will try this patch now.
This patch works fine for me.
(In reply to comment #19) > This patch works fine for me. Can confirm that patch fixes soft lockup problem on ArchLinux with kernel 2.6.19 on a Dell D620. However a related (?) issue remains: -If kill switch is ON when starting ipw3945d, ipw3945d must be restarted after kill switch is switched OFF (i.e. radio ON) in order for wireless to function. -If kill switch was OFF while ipw3945d was started, it can be turned ON and OFF without problems -AFAIK this issue was not present in earlier versions (i.e. 1.0.2) of the driver but am unsure. Is this a driver or ipw3945d problem? [see also: bug 1130] [see also: bug 956]
Created an attachment (id=962) [details] read-rfkill-register-2nd patch
(In reply to comment #20) > However a related (?) issue remains: > -If kill switch is ON when starting ipw3945d, ipw3945d must be restarted after > kill switch is switched OFF (i.e. radio ON) in order for wireless to function. Would you please try the read-rfkill-register-2nd patch to see if the problem is solved? Thanks, Hong
*** Bug 1166 has been marked as a duplicate of this bug. ***
*** Bug 1130 has been marked as a duplicate of this bug. ***
> > Would you please try the read-rfkill-register-2nd patch to see if the problem is > solved? > > Thanks, > Hong Confirmed. Fixes issue for me. Wireless network now comes on-line after a few seconds when kill switch turned off (i.e. radio switched on). You may mark as fixed as far as I am concerned. Thanks!
Move this bug to FIXED. User resolved problem.
I'm running 1.1.3, which is in the Ubuntu Fiesty 2.6.20 kernel. I still see this bug - with RF KILL on, I cannot boot. The strange thing is that I've tried the patches on the 2.6.19 kernel, and that worked. But not using the 1.1.3 version. Reopening.
Any ideas why the 1.1.3 version does not work? I'd be happy to try a patch or help debug - this issue is starting to get on my nerves :-)
(In reply to comment #28) > Any ideas why the 1.1.3 version does not work? I'd be happy to try a patch or > help debug - this issue is starting to get on my nerves :-) Would you please try the read-rfkill-register-2nd patch (it is against ipw3945- 1.1.2)? The 1.1.3 version doesn't include this patch. I will post a one against 1.2.0 later. Thanks, Hong
*** Bug 1178 has been marked as a duplicate of this bug. ***
(In reply to comment #29) > Would you please try the read-rfkill-register-2nd patch (it is against ipw3945- > 1.1.2)? The 1.1.3 version doesn't include this patch. The patch does not seem to work against 1.1.3 -- could you supply a new one? >patching file ipw3945.c >Hunk #1 succeeded at 15895 (offset 88 lines). >Hunk #2 FAILED at 15911. >1 out of 2 hunks FAILED -- saving rejects to file ipw3945.c.rej >patching file ipw3945.h >Hunk #1 succeeded at 1170 (offset 3 lines).
Created an attachment (id=968) [details] patch against 1.2.0
(In reply to comment #31) > The patch does not seem to work against 1.1.3 -- could you supply a new one? Please try the latest patch againt 1.2.0.
(In reply to comment #33) > Please try the latest patch againt 1.2.0. Creepy. This patch wasn't here earlier this evening when I wrote my own. Anyhow, this patch (at least in principle, since mine was based in concept off read-rfkill-register-2nd patch) works on my X60s, which was suffering from lockups during boot if the rf kill switch was set to disable the interface. However, this brought up another issue -- with this fix in place, with a certain high probability switching the wireless back on using the rf-kill switch doesn't start up the "eth1" interface. It appears that if you power on the machine rf-killed, the switch events don't make it to the driver. If you unload and reload the driver (and of course stop then start the ipw3945d) the interface appears, and the driver is then sensitive to throws of the switch. Most unfortunately, since /sys/bus/pci/drivers/ipw3945/*/rf_kill uses the cached priv->status to report hardware killswitch status, it's impossible for a script to tell whether we are in the rf_kill state, or if we just booted into the rf_kill state and therefore are unable to detect switch throws. I've added a patch to create a rf_kill_hw sys node that reports the switch status without caching. This way if a script (like ifup) finds a discrepancy, it knows to just restart the ipw3945 driver.
Created an attachment (id=969) [details] Provide /sys access to the hardware register containing the rf_kill switch status Provide /sys access to the hardware register containing the rf_kill switch status
(In reply to comment #33) > (In reply to comment #31) > > The patch does not seem to work against 1.1.3 -- could you supply a new one? > > Please try the latest patch againt 1.2.0. > The driver compiles but failes to load: FATAL: Error inserting ipw3945 (/lib/modules/2.6.19-beyond/kernel/drivers/net/wireless/ipw3945/ipw3945.ko): Operation not permitted dmesg shows: ipw3945: Unknown symbol ipw_released_restricted_access The following warning is given during compilation: warning: implicit declaration of function 'ipw_released_restricted_access'
(In reply to comment #36) > warning: implicit declaration of function 'ipw_released_restricted_access' ok, seems you made a typo. it should be 'ipw_release_restricted_access'. correcting this in the patch allows the driver to compile. the corrected patch seems to fix the lockup/kill switch problem on my system.
> the corrected patch seems to fix the lockup/kill switch problem on my system. scratch that. the soft lockup is indeed gone. wireless however, does NOT come on-line after switching the kill switch off (i.e. radio on) if the system was booted with kill switch on (i.e. radio off).
(In reply to comment #38) > scratch that. the soft lockup is indeed gone. wireless however, does NOT come > on-line after switching the kill switch off (i.e. radio on) if the system was > booted with kill switch on (i.e. radio off). Would you please check whether the daemon /sbin/ipw3945d is running (ps -ef | grep ipw3945d) before you switch the rfkill switch off? It seems that the daemon is not started when booting the kernel. Thanks, Hong
(In reply to comment #25) > > > > Would you please try the read-rfkill-register-2nd patch to see if the > problem is > > solved? > > > > Thanks, > > Hong > > Confirmed. Fixes issue for me. Wireless network now comes on-line after a few > seconds when kill switch turned off (i.e. radio switched on). > > You may mark as fixed as far as I am concerned. Thanks! Do you mean that it works with ipw3945-1.1.2 with the read-rfkill-register-2nd patch? It's strange, the two patches is almost the same. Thanks, Hong
(In reply to comment #34) >However, this brought up another issue -- with this fix in place, with a certain >high probability switching the wireless back on using the rf-kill switch doesn't >start up the "eth1" interface. It appears that if you power on the machine >rf-killed, the switch events don't make it to the driver. If you unload and >reload the driver (and of course stop then start the ipw3945d) the interface >appears, and the driver is then sensitive to throws of the switch. (In reply to comment #40) > Do you mean that it works with ipw3945-1.1.2 with the read-rfkill-register-2nd > patch? It's strange, the two patches is almost the same. > > Thanks, > Hong OK. I tried it again and Wireless did come on this time. My experience seems consistent with post #34. Sometimes kill_switch events don't register with the driver (dmesg shows nothing expect an atkbd.c message), sometimes they do. I have not experienced this with 1.1.2. However, it could be that I missed it, since the problem only manifests itself part of the time. Will downgrade and report back.
(In reply to comment #39) > (In reply to comment #38) > > scratch that. the soft lockup is indeed gone. wireless however, does NOT come > > on-line after switching the kill switch off (i.e. radio on) if the system was > > booted with kill switch on (i.e. radio off). > > Would you please check whether the daemon /sbin/ipw3945d is running (ps -ef | > grep ipw3945d) before you switch the rfkill switch off? > It seems that the daemon is not started when booting the kernel. > > Thanks, > Hong If the system was booted with radio off I can't turn on the radio (Fn+F2 keys used on a Dell inspiron 6400 laptop) After system starts without radio i see: # ps -ef | grep ipw3945d root 3217 1 0 10:16 ? 00:00:00 /sbin/ipw3945d --quiet root 5283 5250 0 10:32 pts/0 00:00:00 grep ipw3945d And I can't start radio...
(In reply to comment #41) > I have not experienced this with 1.1.2. However, it could be that I missed it, > since the problem only manifests itself part of the time. Will downgrade and > report back. Ok, it seems the patched 1.1.2 does not allow wireless to come on reliably either. For reference, here is what dmesg gives when I do `modprobe ipw3945; /etc/rc.d/ipw3945d start': ipw3945: Intel(R) PRO/Wireless 3945 Network Connection driver for Linux, 1.1.2dmpr ipw3945: Copyright(c) 2003-2006 Intel Corporation ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:0c:00.0 to 64 ipw3945: Detected Intel PRO/Wireless 3945ABG Network Connection Switching the kill switch just produces atkdb.c errors: atkbd.c: Unknown key pressed (translated set 2, code 0x88 on isa0060/serio0). atkbd.c: Use 'setkeycodes e008 <keycode>' to make it known. atkbd.c: Unknown key pressed (translated set 2, code 0x91 on isa0060/serio0). atkbd.c: Use 'setkeycodes e011 <keycode>' to make it known. I am quite sure the patched 1.1.2 worked for me before, so I am really puzzled as to what is going on. BTW, Kill switch events do register when the radio was on during 'modprobe ipw3945; /etc/rc.d/ipw3945d start'
Patch from comment
Fixed patch from comment #32 has solved the lockup problem indeed, but switching network on and off on the fly does not work indeed.
Would you please modify line 117 in ipw3945.c to "static int debug=0x43fff", recompile the driver? Then reboot the system, and provide the dmesg? Thanks, Hong
Set it as NEEDMOREDATA according to comment #46.
Created an attachment (id=991) [details] dmesg of system bootef with kill switch enabled Here you go.
*** Bug 1197 has been marked as a duplicate of this bug. ***
Well, the the thing I've posted does not look like a debug output. Are you sure the value is ok?
This patch seems to help. Any chance of getting it released in an updated package for downstream?
patch committed. Will be included in the next release.
Created an attachment (id=1030) [details] Fixed patch against 1.2.0 Fixed typo inside the patch.
Created an attachment (id=1038) [details] ipw3945.c definitively patched This is my personal ipw3945 source C++ file. It has to be put in ipw3945 source folder and compiled. The real bug of this driver was simply a pair of assignment: other patches made before this source are now obsolete. Have a nice day!
(In reply to comment #55) > Created an attachment (id=1038) [edit] [details] > ipw3945.c once and for all patched > > This is my personal ipw3945 C++ source file. It has to be put in ipw3945 source > folder and compiled. > > The real bug of this driver was simply a pair of assignments: other patches made > before this source are now obsolete. > > Have a nice day!
Verified in 1.2.1.
(In reply to comment #57) > Verified in 1.2.1. 1.2.1 changelog is full of fixes, but most of them are only palliative reatments for a bug that had not been really discovered (in fact, 1.2.1 has the same bug of 1.2.0, but brought up from kernel-space to user-space: kernel goes on, but connection fails). I hope to see as fast as possible ipw3945-1.2.2, without any useless patch. Have a nice day!
I apologize for my "anger": I'm a little bit tired... I didn't want to be rude or arrogant. I'm sorry.
I'm sorry: I have to reopen this bug since 1.2.1 release does not still work. Old patches solve soft lockup problem, but connection fails the 25% of times. I get really sad seeing no one has considered my solution: I've sent my source to thousands of people who had my same problem and they got very happy seeing my solution works. This is the last time I write: 1.2.1 does not work. Please, release 1.2.2 with my solution, or I will have to send my patch all over the planet. Thanks.
Created an attachment (id=1052) [details] patch from Egon This is patch diffs between ipw3945.c from 1.2.0 source and the one from attachment #1038 [details].
ipw3945 as a driver has been replaced by iwl3945 in official kernel for a long time. We suggest to use iwl3945 driver instead of the obsolete ipw3945 driver. If you have bug, please report it with product=iwlwifi and platform="Intel(R) Wifi Link 3945". Thanks so much!