Bug 821 - slab corruption using ipw2200 1.0.8
: slab corruption using ipw2200 1.0.8
Status: VERIFIED FIXED
: IPW2200
__UNSPECIFIED__
: 1.0.8
: ACER Debian
: P2 major
Assigned To:
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2005-10-27 08:14 by
Modified: 2006-02-14 23:10 (History)


Attachments
Kernel config (44.96 KB, text/plain)
2005-10-27 08:15, Bernard Blackham
Details
slab corruption logs from kernel (18.62 KB, text/plain)
2005-10-27 08:15, Bernard Blackham
Details
please try if this patch fix the rmmod oops (473 bytes, patch)
2005-11-03 22:36, Zhu Yi
Details | Diff
Nick's kernel config (49.89 KB, text/plain)
2005-11-05 21:34, Nick Kralevich
Details
Nick's ipw2200 firmware restart error file (3.44 KB, application/octet-stream)
2005-11-05 21:42, Nick Kralevich
Details
Nick's computer after reboot -- debugging with 0x43FFF (808.54 KB, text/plain)
2005-11-05 22:07, Nick Kralevich
Details
debug patch (431 bytes, patch)
2005-11-15 21:51, Zhu Yi
Details | Diff
updated one (644 bytes, patch)
2005-11-15 23:51, Zhu Yi
Details | Diff
fix patch (515 bytes, patch)
2005-11-16 07:06, Zhu Yi
Details | Diff
better fix (522 bytes, patch)
2005-11-16 23:04, Zhu Yi
Details | Diff
dump from dmesg (15.57 KB, text/plain)
2005-12-27 16:50, Sebastian Hyrwall
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-10-27 08:14:32
Since upgrading from ipw2200 1.0.6 to 1.0.8 (ieee80211 1.0.3 to 1.1.6 and
firmware 2.3 to 2.4) on 2.6.13, I've been getting all sorts of fun slab
corruption issues.  Most traces point to ipw2200, or networking code whilst on
wireless.  Some slab corruption is detected just after connecting to wireless,
some when rmmod'ing ipw2200, and occasionally much later.

Kernel is not quite vanilla - applied patches include mppe, suspend2, lirc,
DSDT-initrd and fbsplash, but the only thing that changed between the kernel
version that panicked and the one that didn't was the ipw2200 and ieee80211
versions.

Kernel config and traces attached.

The corruption presents itself at least once or more times in an hour's normal
usage. It occurs in both ad-hoc and infrastructure modes, WEP or no WEP.
------- Comment #1 From 2005-10-27 08:15:15 -------
Created an attachment (id=581) [details]
Kernel config
------- Comment #2 From 2005-10-27 08:15:46 -------
Created an attachment (id=582) [details]
slab corruption logs from kernel
------- Comment #3 From 2005-10-31 01:22:00 -------
I enabled vanilla 2.6.13 with DEBUG_SLAB. I transfered 1G file without any
warning message as you attached. I'm not saying it could not be a driver or
stack problem, but can you try it on vanilla kernel?
------- Comment #4 From 2005-10-31 07:22:04 -------
I've compiled up vanilla 2.6.13 with ieee80211 1.1.6 and ipw2200 1.0.8 (straight
from the tarball, not preprepared patches), but I won't get to test it
thoroughly until tomorrow.

I'm still definitely getting an Oops on rmmod'ing ipw2200 if it has associated
(tested associating with an ad-hoc network), which *might* be related. I'll post
if I get anything else in the morning.

Thanks,
Bernard.

Oops: 0002 [#1]
PREEMPT 
Modules linked in: binfmt_misc iptable_filter ip_tables thermal fan button ac
battery ipv6 capability commoncap pcmcia eth1394 joydev parport_pc parport rtc
yenta_socket rsrc_nonstatic pcmcia_core ipw2200 ieee80211 ieee80211_crypt
firmware_class 8139too mii ohci1394 ieee1394 8250_pci 8250 serial_core
snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc usbhid ehci_hcd uhci_hcd deflate
zlib_deflate twofish serpent aes_i586 blowfish des sha256 sha1 md5 crypto_null
af_key nls_iso8859_1 nls_cp437 vfat fat dm_mod evdev i2c_algo_bit i2c_dev
i2c_i801 i2c_core cpufreq_ondemand cpufreq_powersave visor usbserial usbcore
acpi_cpufreq freq_table processor psmouse unix
CPU:    0
EIP:    0060:[pg0+820360304/1068491776]    Not tainted VLI
EFLAGS: 00010286   (2.6.13) 
EIP is at ipw_pci_remove+0x1a0/0x290 [ipw2200]
eax: 6b6b6b6b   ebx: ee3ce738   ecx: 00000000   edx: 6b6b6b6b   
esi: efce3160   edi: efce3160   ebp: ed5cdea8   esp: ed5cde80   
ds: 007b   es: 007b   ss: 0068
Process rmmod (pid: 5528, threadinfo=ed5cc000 task=ed567000)
Stack: ee3ce728 ef602c70 efce3160 efce3160 00000018 efce270c c1b31200 c1b311bc 
       f136f36c c1b31200 ed5cdeb8 c022a2bb c1b311bc c1b312c0 ed5cded4 c028cd4b 
       c1b31200 c0366f25 c1b31200 f136f36c ed5cc000 ed5cdf0c c028cea6 c1b31200 
Call Trace:
 [show_stack+127/160] show_stack+0x7f/0xa0
 [show_registers+343/448] show_registers+0x157/0x1c0
 [die+342/736] die+0x156/0x2e0
 [do_page_fault+857/1757] do_page_fault+0x359/0x6dd
 [error_code+79/84] error_code+0x4f/0x54
 [pci_device_remove+59/64] pci_device_remove+0x3b/0x40
 [__device_release_driver+139/144] __device_release_driver+0x8b/0x90
 [driver_detach+278/566] driver_detach+0x116/0x236
 [bus_remove_driver+103/144] bus_remove_driver+0x67/0x90
 [driver_unregister+20/48] driver_unregister+0x14/0x30
 [pci_unregister_driver+23/48] pci_unregister_driver+0x17/0x30 
 [pg0+820362887/1068491776] ipw_exit+0x27/0x2b [ipw2200]
 [sys_delete_module+346/400] sys_delete_module+0x15a/0x190
 [syscall_call+7/11] syscall_call+0x7/0xb
Code: ec 8b 9c d0 94 09 00 00 3b 5d e4 8b 33 74 36 8b 7d e0 89 f6 8d bc 27 00 00
00 00 8d 43 f0 89 04 24 e8 95 8d e0 ce 8b 53 04 8b 03 <89> 02 89 50 04 c7 03 00
01 10 00 c7 43 04 00 02 20 00 89 f3 39
------- Comment #5 From 2005-11-03 22:36:01 -------
Created an attachment (id=595) [details]
please try if this patch fix the rmmod oops
------- Comment #6 From 2005-11-04 19:45:56 -------
Hi Bernard,

Just curious, do you have > 1G of memory?  

I'm seeing a huge number of kernel panics, seemingly unrelated to ipw2200, when
I try to run ipw2200 1.0.8, ieee80211 1.1.6.  It got so bad that it seriously
messed up my filesystem on my laptop...

I don't see any kernel panics running ipw2200 1.0.6, ieee 1.0.3. 

I'm wondering if there could be some kind of memory corruption for systems with
greater than 1GB of memory, which may explain why it can't always be reproduced...
------- Comment #7 From 2005-11-05 03:00:56 -------
Yep, the patch fixes the rmmod oops. Thanks!

I haven't yet seen any slab corruption. I'll reintroduce the other patches one
by one to see which one's interacting, but it certainly seems like ipw2200 is
not at fault. Sorry for the hassle.

Nick - I have 1GB of RAM. Do you have any interesting patches applied to your
kernel? Did you want to turn on CONFIG_SLAB_DEBUG and see if you get anything
similar in dmesg to Attachment #582 [details] above?
------- Comment #8 From 2005-11-05 20:52:45 -------
Hardware:  Acer Travelmate 4500 with 1.5G of memory.

I'm running kernel linux-2.6.13-gentoo-r5, using the gentoo patch set.

http://dev.gentoo.org/~dsd/genpatches/
http://dev.gentoo.org/~dsd/genpatches/patches-2.6.13-5.htm

I'll try enabling CONFIG_DEBUG_SLAB and retry ipw2200 1.0.8.  Hopefully I won't
have to spend another half an hour playing with fsck.ext3....  :(  (glutting for
punishment...)
------- Comment #9 From 2005-11-05 21:33:20 -------
Ok, I'm seeing slab corruption too...

ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log captured.
Slab corruption: start=f67ec000, len=16384
000: a0 a9 02 00 c8 00 00 00 68 a3 44 06 b1 00 00 00
010: 20 00 00 00 e8 a5 44 06 08 00 00 00 32 00 00 00
020: 04 a6 44 06 08 00 00 00 08 01 00 00 1e a6 44 06
030: 46 00 00 00 4a 00 00 00 23 a6 44 06 d0 a9 02 00
040: c8 00 00 00 90 a6 44 06 b1 00 00 00 20 00 00 00
050: b2 a7 44 06 08 00 00 00 32 00 00 00 bd a7 44 06
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
ipw2200: Firmware error detected.  Restarting.
ipw2200: Sysfs 'error' log already exists.
nickkral@doodad ~ $ uptime
 21:09:32 up 4 min,  6 users,  load average: 0.99, 0.67, 0.28

------- Comment #10 From 2005-11-05 21:34:06 -------
Created an attachment (id=596) [details]
Nick's kernel config
------- Comment #11 From 2005-11-05 21:42:35 -------
Created an attachment (id=597) [details]
Nick's ipw2200 firmware restart error file
------- Comment #12 From 2005-11-05 22:07:10 -------
Created an attachment (id=598) [details]
Nick's computer after reboot -- debugging with 0x43FFF

I enabled debug=0x43FFF and rebooted my computer.  After a few seconds /
minutes of copying a file via NFS on the local network, I got slab corruption
and errors.  See attached /var/log/messages file for the slab messages and
firmware restarts.
------- Comment #13 From 2005-11-15 21:51:05 -------
Created an attachment (id=611) [details]
debug patch

Can you please try the patch and see if it stops the slab corruption?
------- Comment #14 From 2005-11-15 23:51:09 -------
Created an attachment (id=612) [details]
updated one
------- Comment #15 From 2005-11-16 07:06:44 -------
Created an attachment (id=614) [details]
fix patch

I think the problem is caused by the error log buffer address offset
calculation. Please verify if this patch fix the problem.
------- Comment #16 From 2005-11-16 23:04:42 -------
Created an attachment (id=615) [details]
better fix
------- Comment #17 From 2005-11-17 02:34:17 -------
I think you might've nailed it on the head with this patch!

Looking at my old logs (which I should've posted more of, sorry!), the slab
corruption indeed only ever occurred after a firmware restart. I've had several
firmware restarts now without any slab corruption.

It's curious that I couldn't reproduce it without the suspend2 patch. Perhaps I
was just unlucky?

Good catch =)

Thanks!
------- Comment #18 From 2005-11-17 04:37:05 -------
Attachment #615 [details] seems to fix the random oopses I had with 2.6.14 + suspend2 and
ipw2200-1.0.8 :)
------- Comment #19 From 2005-11-17 07:55:04 -------
PATCH_EXIST. Need to test this patch.
------- Comment #20 From 2005-11-17 10:48:54 -------
to mark as fixed
------- Comment #21 From 2005-11-17 10:49:10 -------
fixed.
------- Comment #22 From 2005-11-19 11:13:56 -------
I'm running with the patch, and my random oops / slab corruption problems seem
to be solved.
------- Comment #23 From 2005-12-07 15:16:25 -------
Works for me. Marking verified based on number of successful replies.
------- Comment #24 From 2005-12-26 20:18:03 -------
I am still having the problems using the 1.0.8 with the txbusy and
slabcorruption patch. Same as everyone else. Dies when there's alot of data.
Tried changing txqueuelen which was suggested in one post. Didn't help. 

Sometimes it dies totally and i need to re-modprobe it or sometimes it just
stalls for 5-10 seconds.
------- Comment #25 From 2005-12-26 23:51:03 -------
Sebastian,

Does it "die" in the kernel oops/panic sense? Or does it stop transferring traffic?

Are you getting firmware restarts?

Any logs you can post?

The doubt the slab corruption bug would exhibit your symptoms - particularly if
rmmod'ing and modprobing fixes it. The key signs of slab corruption are things
in dmesg indicating random oopses (or in CONFIG_DEBUG_SLAB is turned on, slab
corruption warnings).

TIA,
Bernard.
------- Comment #26 From 2005-12-27 16:32:22 -------
Dec 26 04:13:25 [kernel] ipw2200: Sysfs 'error' log already exists.
Dec 26 04:25:04 [kernel] ipw2200: Firmware error detected.  Restarting.
Dec 26 04:25:04 [kernel] ipw2200: Sysfs 'error' log already exists.
Dec 26 04:25:19 [kernel] ipw2200: Failed to send ASSOCIATE: Command timed out.
Dec 26 04:25:42 [kernel] ipw2200: Failed to send CARD_DISABLE: Command timed out


I'll activate debug now and post more info next time the problem occurs. Its
easy to replicate so I'll have more info soon
------- Comment #27 From 2005-12-27 16:49:09 -------
Got some dump now ;) 

Also. I'm right now using 2.6.15-rc6-git3. Thought that the driver in that
kernel would be better than the one i was previously using but there wasn't any
change. 

Before that I used 2.6.14.4. Same problem there except that I think it didn't
occur as frequent as in the kernel I'm using now.

The driver I'm right now using is 1.0.8-r1.  The r1 is from gentoo portage and
includes the following patches :

ipw2200-1.0.8-txbusy.patch                                     
ipw2200-1.0.8-broadcast.patch                                  
ipw2200-1.0.8-slabcorrupt.patch    
------- Comment #28 From 2005-12-27 16:50:07 -------
Created an attachment (id=642) [details]
dump from dmesg

Dmesg dump
------- Comment #29 From 2005-12-29 07:47:52 -------
Hyrwall,

This problem is the firmware problem, not slab corruption.
Could you add comments on the firmware bugs? Or open one new bug?

Thank you!
------- Comment #30 From 2006-01-12 17:45:17 -------
slab corruption is fixed on 1.0.10. 
Verify!
------- Comment #31 From 2006-02-03 10:26:53 -------
For me, it is not fixed in 1.0.10. I don't get any crashes, but dmesg shows a   
lot of messages like:   
ipw2200: Firmware error detected.  Restarting.   
ipw2200: Sysfs 'error' log already exists.   
   
After that, the card is associated, interface is up, but all the packets are   
lost. The kernel is 2.6.14.3 with software suspend compiled in (not enabled,   
though).  
I didn't notice this problem while using a no-name router without encryption.  
I'm now using a 128-bit WEP connection on a D-Link router. 
 
------- Comment #32 From 2006-02-03 11:17:51 -------
(In reply to comment #31)
> For me, it is not fixed in 1.0.10. I don't get any crashes, but dmesg shows a   
> lot of messages like:   
> ipw2200: Firmware error detected.  Restarting.   
> ipw2200: Sysfs 'error' log already exists.   

That's not related to this bug. Please see bug #802.
------- Comment #33 From 2006-02-03 11:18:16 -------
Remarking as verified.
------- Comment #34 From 2006-02-14 23:10:24 -------
*** Bug 812 has been marked as a duplicate of this bug. ***