Bug 667 - "No space for Tx" error when hwcrypto is enabled
: "No space for Tx" error when hwcrypto is enabled
Status: RESOLVED NEEDSMOREDATA
: IPW2200
Driver Load
: 1.0.4
: IBM Debian
: P1 normal
Assigned To:
:
: http://chriscarey.us/hardware/myhardw...
:
:
:
  Show dependency treegraph
 
Reported: 2005-05-18 22:02 by
Modified: 2006-10-10 17:20 (History)


Attachments
syslog output 1 (191.58 KB, text/plain)
2005-05-19 00:50, Christopher Carey
Details
syslog output 2 (1.85 KB, text/plain)
2005-05-19 00:50, Christopher Carey
Details
syslog output 3 (389.71 KB, text/plain)
2005-05-19 00:50, Christopher Carey
Details
dmesg output without debug (1.50 KB, text/plain)
2005-05-19 02:27, Henrik Brix Andersen
Details
fix patch (1.26 KB, patch)
2005-06-08 05:00, Zhu Yi
Details | Diff
dmesg output without patch (43.82 KB, text/plain)
2005-06-16 03:18, Patrick Renkowski
Details
dmesg output with patch (34.47 KB, text/plain)
2005-06-16 03:19, Patrick Renkowski
Details
debug information using debug=0x43fff (127.67 KB, text/plain)
2005-07-03 20:52, LDB
Details
debug information using debug=0x43fff (127.67 KB, text/plain)
2005-07-03 20:55, LDB
Details
debug information using debug=0x43fff (127.67 KB, text/plain)
2005-07-03 21:00, LDB
Details
a patch to try (4.45 KB, patch)
2005-07-06 04:07, Zhu Yi
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2005-05-18 22:02:22
I have come across many issues with the 1.0.4 driver. Multiple errors, multiple
firmware resets. Ive been able to get it to connect, but if I unload and reload
it may not reconnect the next time. 

The debugs from syslog are quite long and Im not familiar with bugzilla, so I
put them on my website:

http://chriscarey.us/hardware/myhardware/thinkpad-t41/ipw2200/
------- Comment #1 From 2005-05-19 00:50:17 -------
Created an attachment (id=385) [details]
syslog output 1
------- Comment #2 From 2005-05-19 00:50:34 -------
Created an attachment (id=386) [details]
syslog output 2
------- Comment #3 From 2005-05-19 00:50:48 -------
Created an attachment (id=387) [details]
syslog output 3
------- Comment #4 From 2005-05-19 00:54:43 -------
I have two access points in WDS mode. They both have the same SSID and are both
within range. Im not sure if that may contribute to the error but I thought I'd
point it out because it is the only non-traditional part of my setup. 
------- Comment #5 From 2005-05-19 02:25:59 -------
I have seen the command failure issue with ipw2200-1.0.4 here as well - which
result in the 'no space for tx' message.
------- Comment #6 From 2005-05-19 02:27:40 -------
Created an attachment (id=388) [details]
dmesg output without debug

I have no idea what debug level might yield anything interesting for the above
situation, please advice.
------- Comment #7 From 2005-05-19 02:32:18 -------
echo 100 > /sys/class/firmware/timeout has improved reliability for me. I was
able to remove and modprobe the driver successfully. Once the card is connected
to the access point, the errors dont continue and things seem ok. It is during
the scanning phase that they show.

------- Comment #8 From 2005-05-19 08:59:38 -------
The best debug level (for 99% of the issues) is 0x43fff.  That will capture the
extra firmware data during restarts, and also traces all internal and external
invoked changes to the state of the driver, allowing us to follow the
configuration logic.
------- Comment #9 From 2005-05-27 08:56:28 -------
changing title, per bug scrub
------- Comment #10 From 2005-06-08 05:00:57 -------
Created an attachment (id=414) [details]
fix patch
------- Comment #11 From 2005-06-08 16:26:42 -------
v1.04 is completely unreliable for me due to all the firmware errors. Others in
my office have seen the same thing.

I'm applying the patch just posted today and I'll test it tomorrow.
------- Comment #12 From 2005-06-09 01:53:07 -------
The patch (attachment #414 [details]) seems to do it here. I have not seen that error
message since.
------- Comment #13 From 2005-06-09 12:10:34 -------
Darn, just encountered the problem again on system boot. Unfortunately, I didn't
catch the error log. Seems the proposed patch doesn't work anyway :/
------- Comment #14 From 2005-06-09 14:05:36 -------
I can trigger a firmware restart with 1.0.4+patch (applied with some offset
fuzz) within 1-3minutes by trying to  transfer a large file to my laptop.

This is no change from just plain old 1.0.4.
------- Comment #15 From 2005-06-09 19:44:29 -------
Does it happen only when you are using WPA? Please provide dmesg with debug
level 0x43fff for the patched version.
------- Comment #16 From 2005-06-10 09:13:33 -------
I'm just using 128bit WEP. I'll try to get some debug.

Is the proper way "modprobe ipw2200 debug=0x43fff" ??
------- Comment #17 From 2005-06-10 09:20:18 -------
(In reply to comment #16)
> I'm just using 128bit WEP. I'll try to get some debug.
> 
> Is the proper way "modprobe ipw2200 debug=0x43fff" ??

It most not be, cause I loaded it with that command, then xfered a large file
which, several minutes into the xfer, only resulted in this output:

ipw2200: Firmware error detected.  Restarting.
------- Comment #18 From 2005-06-12 22:58:49 -------
Dax, If your got firmware errors but only "No space for Tx", bug 697 should be
the right place to put your comment. Seems you didn't enable CONFIG_IPW_DEBUG in
your Makefile (if you have ipw2100 enabled in your kernel .config, you should
enable it  there).

Brix, do you use encryption when you see "No space for Tx" warning?
------- Comment #19 From 2005-06-13 04:31:08 -------
It happens both with and without encryption - but I see it very rarely, and have
not been able to capture debug output yet...
------- Comment #20 From 2005-06-15 04:09:32 -------
I have got the same problem here. Don't know if it helps, but I could  
localize / reproduce the mistakes.  
This problem only occurs, when I use wpa_supplicant to connect to an AP. When I  
use iwconfig to connect without encryption or with WEP encryption everything  
works fine.   
I think it's a problem with TKIP. Whenever I try to connect to my AP, the  
driver begins to produce the errors, after the State in wpa_supplicant  
changes from 4-WAY HANDSHAKE -> GROUP HANDSHAKE. I have to reset the IPW2200  
Firmware with the kill switch. Then everything works fine. But when I reload  
the ieee80211_crypt_tkip modul and try it again, I get the same mistakes until  
I reset the Firmware.  
  
I have tried this with two different AP's.  
  
My System is an Acer Travelmate 292 with Ubuntu 5.04 installed. Will try it 
with an unpatched kernel later. 
------- Comment #21 From 2005-06-15 04:36:26 -------
(In reply to comment #20)
> I have to reset the IPW2200 Firmware with the kill switch.

Does the problem reproduceable if you don't reset the kill switch? Does your
dmesg contain something like below? Please attach you dmesg with debug=255. Did
you try attachment 414 [details] patch?

> failed to send ASSOCIATE command
> failed to send SCAN_REQUEST_EXT command
> failed to send SYSTEM_CONFIG command
> ipw_send_system_config failed
> failed to send SCAN_REQUEST_EXT command
> No space for Tx
------- Comment #22 From 2005-06-16 03:18:27 -------
Created an attachment (id=423) [details]
dmesg output without patch
------- Comment #23 From 2005-06-16 03:19:18 -------
Created an attachment (id=424) [details]
dmesg output with patch
------- Comment #24 From 2005-06-16 03:36:12 -------
I have added two files with my dmesg-output at debug-level 255. I wrote some 
comments between the lines for you, to make clear, when the problem begins. 
 
For these logs I used the original ubuntu kernel 2.6.10-5-386 with the ipw2200 
driver with and without the patch 414 included. Both produced the same mistakes 
on my system. 
 
As you can see there is an endless loop at the end of each file. When I stop 
wpa_supplicant and restart it again, sometimes the second/third try is 
successful, but only after a Firmware restart. 
 
On my Travelmate 292 I can reproduce the error by unloading the 
ieee80211_crypt_tkip modul. After that I turn off and on again my Kill Switch. 
The message "ipw2200: Firmware error detected. Restarting" appears. 
 
My little workaround is to switch off and on again the kill switch without 
unloading the modul mentioned above. In round about 70 % of my trys I get a 
connection afterwards. 
 
This problem only occurs on my Laptop, when I use WPA encryption. WEP and "no 
encryption" is working fine. It doesn't matter whether the ssid is hidden or 
not. 
 
Hope this helps you a bit. As I said, I'll try it with a vanilla kernel later. 
------- Comment #25 From 2005-06-18 03:58:25 -------
I finally found a solution for the Problem, maybe someone else can check this
out.

I disabled the hwcrypto Option of the ipw2200 driver. There seems to be a
problem with this option and authentification with WPA-PSK / TKIP. Now
everything works fine and I'm happy with a wonderful driver for Linux :-)
------- Comment #26 From 2005-06-19 19:21:07 -------
Patrick, thanks for your reporting. I've changed the title to indicate this bug
only happens when using hwcrypto. There are some related fixes in 1.0.5 (will
come soon), please verify the bug at the time when it comes out.
------- Comment #27 From 2005-06-29 18:56:26 -------
*** Bug 716 has been marked as a duplicate of this bug. ***
------- Comment #28 From 2005-07-03 20:52:28 -------
Created an attachment (id=448) [details]
debug information using debug=0x43fff

23:32 timeframe it lost connectivity twice during a large download
23:33 timeframe it lost connectivity once during a large download
23:34:49-50 timeframe it lost connectivity once during a large download
23:35:15-17,34-35 timeframe it lost connectivity twice during a large download
23:36:00-01 timeframe it lost connectivity once during a large download

/var/log/kern contains all the contents for debugging modules and kernel
related
issues.
------- Comment #29 From 2005-07-03 20:55:48 -------
Created an attachment (id=449) [details]
debug information using debug=0x43fff

23:32 timeframe it lost connectivity twice during a large download
23:33 timeframe it lost connectivity once during a large download
23:34:49-50 timeframe it lost connectivity once during a large download
23:35:15-17,34-35 timeframe it lost connectivity twice during a large download
23:36:00-01 timeframe it lost connectivity once during a large download

/var/log/kern contains all the contents for debugging modules and kernel
related
issues.

The file can also be found at 

http://www.ldb-jab.org/bugs/kern

of

http://www.ldb-jab.org/bugs/kern/gz
------- Comment #30 From 2005-07-03 21:00:22 -------
Created an attachment (id=450) [details]
debug information using debug=0x43fff

23:32 timeframe it lost connectivity twice during a large download
23:33 timeframe it lost connectivity once during a large download
23:34:49-50 timeframe it lost connectivity once during a large download
23:35:15-17,34-35 timeframe it lost connectivity twice during a large download
23:36:00-01 timeframe it lost connectivity once during a large download

/var/log/kern contains all the contents for debugging modules and kernel
related
issues.

The file can also be found at 

http://www.ldb-jab.org/bugs/kern

of

http://www.ldb-jab.org/bugs/kern/gz
------- Comment #31 From 2005-07-06 04:07:02 -------
Created an attachment (id=452) [details]
a patch to try

Please load module without hwcrypto=0 and see if the problem is fixed.
------- Comment #32 From 2005-07-06 04:29:39 -------
For me, your patch is working fine, thanks a lot. No more problems with hwcrypto
enabled.
------- Comment #33 From 2005-07-06 08:48:41 -------
Please don't mark as FIXED before the patch has been included upstream.
------- Comment #34 From 2005-07-07 07:08:43 -------
Marking as fixed in v1.0.5.
------- Comment #35 From 2005-07-26 17:55:18 -------
Those who used to get the "No space for Tx" error in 1.0.4, can you please
comment whether this error is gone in 1.0.6? -thx
------- Comment #36 From 2005-07-26 23:52:59 -------
Not fixed for me. Im using 1.0.6 now. I tried Yi's patch3 against 1.0.6 and it
did not help either.

I still use hwcrypto=0 in order for the driver to function
------- Comment #37 From 2005-08-01 13:24:17 -------
Marking as reopened. I still cannot use the driver without hwcrypto=0 switch. I
see that the patch worked for Patrick Renkowski. Is there anyone else who is
still having this trouble as I am? 
------- Comment #38 From 2005-08-04 17:16:36 -------
Assigning to Yi.

From scrub:

<chuyee> this one need more look
<chuyee> I think I did something to make it better, but not totally solved
<chuyee> or do you want a walkaround ;)
<logics_sbux> we really need to nail this one
<logics_sbux> unless we suspect it is being caused by a fw lockup 
<logics_sbux> (i noticed a NMI in the most recent debug information 
attachement)
<chuyee> it is not easily reproducable
<logics_sbux> is the bug about seeing the 'no space for tx' or about firmware 
restarts?
<chuyee> "no space for tx" is caused by a "firmware halt"
<logics_sbux> ok; so the firmware is dying, trasnfer attempts to continue 
until all the slots are full?
<logics_sbux> s/trasnfer/transfer/
<chuyee> yeah
<chuyee> but only saw with hwcrypto enabled
<logics_sbux> the specific firmware dumps random, or consistently NMIs?
<chuyee> I think the dump is in the late time after quite a lot "no space"
<chuyee> because firmware is in a unstable state at that time. But the initial 
reason of "firmware halt" is unclear.
<logics_sbux> hmm
<logics_sbux> i wonder if the root cause is actually overflowing the ring 
buffer
<logics_sbux> that then confuses things
<logics_sbux> meaning our queue full logic might be in error
<chuyee> the error is in the early phase of association
<chuyee> do you mean your recent change for NETIF_TX_FULL or something?
<chuyee> maybe TX_DROP
<logics_sbux> oh; its during association?
<chuyee> yes
<logics_sbux> that's odd.  why would hwcrypto play a role there unless its 
shared key authentication?
<logics_sbux> (it is consistent that with hwcrypto=0 the problem goes away--
correct?)
<chuyee> it sends TGi key
<chuyee> I thought it might be the sequence of SYSTEM_CONFIG and TGI key
<chuyee> but in bug 792 I change the sequence to the right order, it *fix* for 
me, but still someone see the bug
<logics_sbux> even if its an open or wep AP we set the TGi?
<chuyee> no, only with AES/TKIP
<chuyee> wep and open no problem
<logics_sbux> ok
<salwan> is the dmesg output that submitter provided with 1.0.4 sufficient for 
now, or do need new dmesg info with 1.0.6?
<chuyee> yechun said he sometimes can reproduce the bug on his laptop, I can 
use his laptop to reproduce
<chuyee> so I don't need the log this time
------- Comment #39 From 2005-09-19 19:13:00 -------
*** Bug 791 has been marked as a duplicate of this bug. ***
------- Comment #40 From 2005-11-03 21:16:04 -------
*** Bug 828 has been marked as a duplicate of this bug. ***
------- Comment #41 From 2006-02-15 00:18:41 -------
please try ipw2200-1.0.11
------- Comment #42 From 2006-02-17 01:28:57 -------
I don't know about everyone else on this bug, but I haven't needed hwcrypto=0
for this since 1.0.10 came out (I use it for unrelated reasons. :) )
------- Comment #43 From 2006-10-09 13:52:50 -------
I'm seeing something very similar to this bug in version 1.1.2kmprq in kernel
2.6.18-rc7.  If hwcrypto is enabled, the wireless link stops working after a few
seconds of heavy load, with the [ipw2200/0] process eating 100% cpu and the
following error:

Oct  9 22:41:33 ophelia kernel: [ 1090.151000] ipw2200: No space for Tx
Oct  9 22:41:33 ophelia kernel: [ 1090.151000] ipw2200: Failed to send
SCAN_REQUEST_EXT: Reason -16
------- Comment #44 From 2006-10-09 19:43:47 -------
(In reply to comment #43)
> Oct  9 22:41:33 ophelia kernel: [ 1090.151000] ipw2200: No space for Tx
> Oct  9 22:41:33 ophelia kernel: [ 1090.151000] ipw2200: Failed to send
> SCAN_REQUEST_EXT: Reason -16

Can you attach the full log please? 

------- Comment #45 From 2006-10-10 17:20:19 -------
need more info