Bugzilla – Bug 1260
Microcode SW error (SYSASSERT (#5)) under high load
Last modified: 2008-12-08 22:14:39
You need to log in before you can comment on or make changes to this bug.
Moin! I have the problem that the driver crashes with "ipw3945: Microcode SW error detected. Restarting.". I noticed every time before this the message "Error sending LEDS_CMD: time out after 500ms." in the kernel log. It *seems* to me that this problem occurs when I transmit a large amout of data (so the LED can't flash fast enough? ;)). I really cannot tell if this problem also occured with older firmware versions. I also had some Microcode SW errors, but I didn't watch if a LEDS_CMD timeout was preceding it (and also I even couldn't tell you the firmware version - it was shipped with Debian without any version information :(). The problem occurs with any proximity to the Access Point (sitting asside or one floor below doesn't change anything). Additional information below. Greetings, Sebastian System Information ================== Machine: Lenovo/IBM Thinkpad X60s with an IPW3945 Firmware version: 1.14.2, freshly downloaded from http://bughost.org/ipw3945/ Linux Kernel version: 2.6.20.1 (almost vanilla, except bootsplash. With Debian standard configuration) ipw3945d version: 1.7.22 (out of Debian package with version 1.7.22-4) Access Point: Thomsom Speedtouch 585i v6 with Firmware version 5.4.0.14 Security: WPA2-PSK
Created an attachment (id=1027) [details] Kernel log with debug=0x43fff
Created an attachment (id=1028) [details] Kernel configuration Just in case that matters :-)
Moin! I investigated a bit further: - The problem is probably *not* related to a LEDS_CMD timeout, as the problem still exists with ipw3945 being loaded with led=0 (although "ipw_queue_tx_hcmd Sending command LEDS_CMD (#48), seq: ..." shows up in the debug log...) - The problem is also probably not related to WPA, it also exists with WPA or WEP completely disabled. - I also checked out the firmware attached to bug #1085, problem remains the same. - The problem is indeed the high load. I can totally reproduce it by scp-ing a large file. (Large, because the problem disappears when the network throughput is reduced after a few seconds.) - Maybe the bug is related to bug #1201? The debug output seems quite the same to me (SYSASSERT #5), although my system is not unresponsive at all. Is there any more information I could provide you? Sebastian
Created an attachment (id=1035) [details] Kernel log with debug=0x43fff and led=0
Hmm, I found out something courious (at least for me): when I set CONFIG_PREEMPT=y, the bug disappears and the connection works as expected. (This was the case in 2.6.21.5, I didn't test it with .21.) I would have expected this the other way round.. but - well, I'm not a kernel hacker. Maybe this is some kind of lock or whatever being held? Sebastian
I have a very similar problem here. Specs: Sony VAIO SZ-350BP (brazilian version) Kernel: 2.6.20-gentoo-r8 (gentoo-sources) ipw3945: 1.2.0 ipw3945-ucode: 1.14.2 ipw3945d: 1.7.22-r4 dmesg output is pretty much the same, so I'll not be posting it here. It seems that switching between X and VTs aggravates the problem, since some switches (3 or 4) are enough to bring down the connection. Well, I've just tried activating full kernel preemtion (it was set at partial kernel preemption) and the connection seems much more stable. In the other hand, switching from the text console to X now causes the ipw to restart *every time*. (now, the problem happens only while switching from VT to X, not the other way). I'll be gladly feeding more information whenever it's needed, this bug's been freaking me out for some time now. Greetings, Cesar Kawakami
(In reply to comment #5) > Hmm, > > I found out something courious (at least for me): > when I set CONFIG_PREEMPT=y, the bug disappears and the connection works as > expected. (This was the case in 2.6.21.5, I didn't test it with .21.) > > I would have expected this the other way round.. but - well, I'm not a kernel > hacker. Maybe this is some kind of lock or whatever being held? > > Sebastian I have the same option set on kernel 2.6.23 (also happened on 2.6.22) and I get the same message about LEDS_CMD time out. It's periodic, but when it happens I have a lot of trouble getting things back. Sometimes I have to force a reboot because I can't get the interface to recover. I have pretty much the same outputs as already posted so I won't spam here :)
Enabling full preemption made the problem much better here, too. I'm not sure if I still see it occasionally or if it's completely gone. It might also still depend on system load, and preemption is just hiding it to a large extend.
Thanks for the reports. Zhu Yi has been working on/with some patches sent in by a community member just in the past day or two ... these look promising. It turns out that we've been unnecessarily using heavy-handed spin_lock_irqsave (), which turns off interrupts (not just our own!), thus the system lock-ups. Also, we were not replenishing the Rx queues as often as we could, so Rx buffers could pile up in heavy traffic and/or heavy CPU usage, while we were processing earlier Rx and command (e.g. Tx) response buffers. Thus the firmware errors in heavy traffic, due to running out of buffer space to put things. New patches should fix these issues. I'll let Yi take it from here. -- Ben --
Oooops, Yi's work is with 2200, not 3945! (My morning stupor at work there). But there sure are a lot of the spin_lock_irqsave()s in 3945/4965, too. -- Ben --
Hi just wanted to report back this same bug in launchpad. https://bugs.launchpad.net/ipw3945/+bug/109887 Thanks
ipw3945 as a driver has been replaced by iwl3945 in official kernel for a long time. We suggest to use iwl3945 driver instead of the obsolete ipw3945 driver. If you have bug, please report it with product=iwlwifi and platform="Intel(R) Wifi Link 3945". Thanks so much!