Bugzilla – Bug 821
slab corruption using ipw2200 1.0.8
Last modified: 2006-02-14 23:10:24
You need to log in before you can comment on or make changes to this bug.
Since upgrading from ipw2200 1.0.6 to 1.0.8 (ieee80211 1.0.3 to 1.1.6 and firmware 2.3 to 2.4) on 2.6.13, I've been getting all sorts of fun slab corruption issues. Most traces point to ipw2200, or networking code whilst on wireless. Some slab corruption is detected just after connecting to wireless, some when rmmod'ing ipw2200, and occasionally much later. Kernel is not quite vanilla - applied patches include mppe, suspend2, lirc, DSDT-initrd and fbsplash, but the only thing that changed between the kernel version that panicked and the one that didn't was the ipw2200 and ieee80211 versions. Kernel config and traces attached. The corruption presents itself at least once or more times in an hour's normal usage. It occurs in both ad-hoc and infrastructure modes, WEP or no WEP.
Created an attachment (id=581) [details] Kernel config
Created an attachment (id=582) [details] slab corruption logs from kernel
I enabled vanilla 2.6.13 with DEBUG_SLAB. I transfered 1G file without any warning message as you attached. I'm not saying it could not be a driver or stack problem, but can you try it on vanilla kernel?
I've compiled up vanilla 2.6.13 with ieee80211 1.1.6 and ipw2200 1.0.8 (straight from the tarball, not preprepared patches), but I won't get to test it thoroughly until tomorrow. I'm still definitely getting an Oops on rmmod'ing ipw2200 if it has associated (tested associating with an ad-hoc network), which *might* be related. I'll post if I get anything else in the morning. Thanks, Bernard. Oops: 0002 [#1] PREEMPT Modules linked in: binfmt_misc iptable_filter ip_tables thermal fan button ac battery ipv6 capability commoncap pcmcia eth1394 joydev parport_pc parport rtc yenta_socket rsrc_nonstatic pcmcia_core ipw2200 ieee80211 ieee80211_crypt firmware_class 8139too mii ohci1394 ieee1394 8250_pci 8250 serial_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc usbhid ehci_hcd uhci_hcd deflate zlib_deflate twofish serpent aes_i586 blowfish des sha256 sha1 md5 crypto_null af_key nls_iso8859_1 nls_cp437 vfat fat dm_mod evdev i2c_algo_bit i2c_dev i2c_i801 i2c_core cpufreq_ondemand cpufreq_powersave visor usbserial usbcore acpi_cpufreq freq_table processor psmouse unix CPU: 0 EIP: 0060:[pg0+820360304/1068491776] Not tainted VLI EFLAGS: 00010286 (2.6.13) EIP is at ipw_pci_remove+0x1a0/0x290 [ipw2200] eax: 6b6b6b6b ebx: ee3ce738 ecx: 00000000 edx: 6b6b6b6b esi: efce3160 edi: efce3160 ebp: ed5cdea8 esp: ed5cde80 ds: 007b es: 007b ss: 0068 Process rmmod (pid: 5528, threadinfo=ed5cc000 task=ed567000) Stack: ee3ce728 ef602c70 efce3160 efce3160 00000018 efce270c c1b31200 c1b311bc f136f36c c1b31200 ed5cdeb8 c022a2bb c1b311bc c1b312c0 ed5cded4 c028cd4b c1b31200 c0366f25 c1b31200 f136f36c ed5cc000 ed5cdf0c c028cea6 c1b31200 Call Trace: [show_stack+127/160] show_stack+0x7f/0xa0 [show_registers+343/448] show_registers+0x157/0x1c0 [die+342/736] die+0x156/0x2e0 [do_page_fault+857/1757] do_page_fault+0x359/0x6dd [error_code+79/84] error_code+0x4f/0x54 [pci_device_remove+59/64] pci_device_remove+0x3b/0x40 [__device_release_driver+139/144] __device_release_driver+0x8b/0x90 [driver_detach+278/566] driver_detach+0x116/0x236 [bus_remove_driver+103/144] bus_remove_driver+0x67/0x90 [driver_unregister+20/48] driver_unregister+0x14/0x30 [pci_unregister_driver+23/48] pci_unregister_driver+0x17/0x30 [pg0+820362887/1068491776] ipw_exit+0x27/0x2b [ipw2200] [sys_delete_module+346/400] sys_delete_module+0x15a/0x190 [syscall_call+7/11] syscall_call+0x7/0xb Code: ec 8b 9c d0 94 09 00 00 3b 5d e4 8b 33 74 36 8b 7d e0 89 f6 8d bc 27 00 00 00 00 8d 43 f0 89 04 24 e8 95 8d e0 ce 8b 53 04 8b 03 <89> 02 89 50 04 c7 03 00 01 10 00 c7 43 04 00 02 20 00 89 f3 39
Created an attachment (id=595) [details] please try if this patch fix the rmmod oops
Hi Bernard, Just curious, do you have > 1G of memory? I'm seeing a huge number of kernel panics, seemingly unrelated to ipw2200, when I try to run ipw2200 1.0.8, ieee80211 1.1.6. It got so bad that it seriously messed up my filesystem on my laptop... I don't see any kernel panics running ipw2200 1.0.6, ieee 1.0.3. I'm wondering if there could be some kind of memory corruption for systems with greater than 1GB of memory, which may explain why it can't always be reproduced...
Yep, the patch fixes the rmmod oops. Thanks! I haven't yet seen any slab corruption. I'll reintroduce the other patches one by one to see which one's interacting, but it certainly seems like ipw2200 is not at fault. Sorry for the hassle. Nick - I have 1GB of RAM. Do you have any interesting patches applied to your kernel? Did you want to turn on CONFIG_SLAB_DEBUG and see if you get anything similar in dmesg to Attachment #582 [details] above?
Hardware: Acer Travelmate 4500 with 1.5G of memory. I'm running kernel linux-2.6.13-gentoo-r5, using the gentoo patch set. http://dev.gentoo.org/~dsd/genpatches/ http://dev.gentoo.org/~dsd/genpatches/patches-2.6.13-5.htm I'll try enabling CONFIG_DEBUG_SLAB and retry ipw2200 1.0.8. Hopefully I won't have to spend another half an hour playing with fsck.ext3.... :( (glutting for punishment...)
Ok, I'm seeing slab corruption too... ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log captured. Slab corruption: start=f67ec000, len=16384 000: a0 a9 02 00 c8 00 00 00 68 a3 44 06 b1 00 00 00 010: 20 00 00 00 e8 a5 44 06 08 00 00 00 32 00 00 00 020: 04 a6 44 06 08 00 00 00 08 01 00 00 1e a6 44 06 030: 46 00 00 00 4a 00 00 00 23 a6 44 06 d0 a9 02 00 040: c8 00 00 00 90 a6 44 06 b1 00 00 00 20 00 00 00 050: b2 a7 44 06 08 00 00 00 32 00 00 00 bd a7 44 06 ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. nickkral@doodad ~ $ uptime 21:09:32 up 4 min, 6 users, load average: 0.99, 0.67, 0.28
Created an attachment (id=596) [details] Nick's kernel config
Created an attachment (id=597) [details] Nick's ipw2200 firmware restart error file
Created an attachment (id=598) [details] Nick's computer after reboot -- debugging with 0x43FFF I enabled debug=0x43FFF and rebooted my computer. After a few seconds / minutes of copying a file via NFS on the local network, I got slab corruption and errors. See attached /var/log/messages file for the slab messages and firmware restarts.
Created an attachment (id=611) [details] debug patch Can you please try the patch and see if it stops the slab corruption?
Created an attachment (id=612) [details] updated one
Created an attachment (id=614) [details] fix patch I think the problem is caused by the error log buffer address offset calculation. Please verify if this patch fix the problem.
Created an attachment (id=615) [details] better fix
I think you might've nailed it on the head with this patch! Looking at my old logs (which I should've posted more of, sorry!), the slab corruption indeed only ever occurred after a firmware restart. I've had several firmware restarts now without any slab corruption. It's curious that I couldn't reproduce it without the suspend2 patch. Perhaps I was just unlucky? Good catch =) Thanks!
Attachment #615 [details] seems to fix the random oopses I had with 2.6.14 + suspend2 and ipw2200-1.0.8 :)
PATCH_EXIST. Need to test this patch.
to mark as fixed
fixed.
I'm running with the patch, and my random oops / slab corruption problems seem to be solved.
Works for me. Marking verified based on number of successful replies.
I am still having the problems using the 1.0.8 with the txbusy and slabcorruption patch. Same as everyone else. Dies when there's alot of data. Tried changing txqueuelen which was suggested in one post. Didn't help. Sometimes it dies totally and i need to re-modprobe it or sometimes it just stalls for 5-10 seconds.
Sebastian, Does it "die" in the kernel oops/panic sense? Or does it stop transferring traffic? Are you getting firmware restarts? Any logs you can post? The doubt the slab corruption bug would exhibit your symptoms - particularly if rmmod'ing and modprobing fixes it. The key signs of slab corruption are things in dmesg indicating random oopses (or in CONFIG_DEBUG_SLAB is turned on, slab corruption warnings). TIA, Bernard.
Dec 26 04:13:25 [kernel] ipw2200: Sysfs 'error' log already exists. Dec 26 04:25:04 [kernel] ipw2200: Firmware error detected. Restarting. Dec 26 04:25:04 [kernel] ipw2200: Sysfs 'error' log already exists. Dec 26 04:25:19 [kernel] ipw2200: Failed to send ASSOCIATE: Command timed out. Dec 26 04:25:42 [kernel] ipw2200: Failed to send CARD_DISABLE: Command timed out I'll activate debug now and post more info next time the problem occurs. Its easy to replicate so I'll have more info soon
Got some dump now ;) Also. I'm right now using 2.6.15-rc6-git3. Thought that the driver in that kernel would be better than the one i was previously using but there wasn't any change. Before that I used 2.6.14.4. Same problem there except that I think it didn't occur as frequent as in the kernel I'm using now. The driver I'm right now using is 1.0.8-r1. The r1 is from gentoo portage and includes the following patches : ipw2200-1.0.8-txbusy.patch ipw2200-1.0.8-broadcast.patch ipw2200-1.0.8-slabcorrupt.patch
Created an attachment (id=642) [details] dump from dmesg Dmesg dump
Hyrwall, This problem is the firmware problem, not slab corruption. Could you add comments on the firmware bugs? Or open one new bug? Thank you!
slab corruption is fixed on 1.0.10. Verify!
For me, it is not fixed in 1.0.10. I don't get any crashes, but dmesg shows a lot of messages like: ipw2200: Firmware error detected. Restarting. ipw2200: Sysfs 'error' log already exists. After that, the card is associated, interface is up, but all the packets are lost. The kernel is 2.6.14.3 with software suspend compiled in (not enabled, though). I didn't notice this problem while using a no-name router without encryption. I'm now using a 128-bit WEP connection on a D-Link router.
(In reply to comment #31) > For me, it is not fixed in 1.0.10. I don't get any crashes, but dmesg shows a > lot of messages like: > ipw2200: Firmware error detected. Restarting. > ipw2200: Sysfs 'error' log already exists. That's not related to this bug. Please see bug #802.
Remarking as verified.
*** Bug 812 has been marked as a duplicate of this bug. ***