Bug 1089 - Possible circular locking dependency
: Possible circular locking dependency
Status: VERIFIED FIXED
: IPW3945
Driver Load
: 1.1.0-pre2
: All All
: P1 normal
Assigned To:
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2006-07-06 14:22 by
Modified: 2007-06-08 07:56 (History)


Attachments
patch to try (4.94 KB, patch)
2006-07-06 20:09, Liu, Hong
Details | Diff
From: Zhu Yi <yi.zhu@intel.com> - patch to fix potential circular dep (1.13 KB, patch)
2006-07-10 09:50, Parag Warudkar
Details | Diff
patch (5.84 KB, patch)
2006-07-14 01:54, Liu, Hong
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-07-06 14:22:11
The lock dependency validator in 2.6.18-rc1 detects the following circular
locking dependency - at this point I am not sure if the validator is correct
or just confused. This same error was sent by email to James and SF mailing 
list. Logging it here so it doesn't get lost. Thanks.

=======================================================
[ INFO: possible circular locking dependency detected ]
-------------------------------------------------------
iwconfig/9878 is trying to acquire lock:
 (&priv->mutex){--..}, at: [<c0336590>] mutex_lock+0x8/0xa

but task is already holding lock:
 (rtnl_mutex){--..}, at: [<c0336590>] mutex_lock+0x8/0xa

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){--..}:
       [<c012f979>] check_prevs_add+0x47/0xac
       [<c0130dcf>] __lock_acquire+0x722/0x781
       [<c0131196>] lock_acquire+0x49/0x66
       [<c033664f>] __mutex_lock_slowpath+0xb3/0x226
       [<c0336590>] mutex_lock+0x8/0xa
       [<c02eb2fd>] rtnl_lock+0xd/0xf
       [<c02e5787>] register_netdev+0xd/0x77
       [<f8b87140>] ipw_bg_calibrated_work+0x69/0x1ac [ipw3945]
       [<c01286e6>] run_workqueue+0x8b/0xcb
       [<c012882e>] worker_thread+0x108/0x13c
       [<c012b43e>] kthread+0x84/0xad
       [<c0100ec5>] kernel_thread_helper+0x5/0xb

-> #0 (&priv->mutex){--..}:
       [<c012f979>] check_prevs_add+0x47/0xac
       [<c0130dcf>] __lock_acquire+0x722/0x781
       [<c0131196>] lock_acquire+0x49/0x66
       [<c033664f>] __mutex_lock_slowpath+0xb3/0x226
       [<c0336590>] mutex_lock+0x8/0xa
       [<f8b92716>] ipw_wx_get_name+0x41/0xd4 [ipw3945]
       [<c02ee46b>] ioctl_standard_call+0x52/0x245
       [<c02ee6b6>] wireless_process_ioctl+0x58/0x320
       [<c02e545a>] dev_ioctl+0x35a/0x38d
       [<c02dae8e>] sock_ioctl+0x1bc/0x1ca
       [<c01689ea>] do_ioctl+0x26/0x68
       [<c0168cbf>] vfs_ioctl+0x18a/0x19b
       [<c0168d16>] sys_ioctl+0x46/0x64
       [<c0102809>] sysenter_past_esp+0x56/0x8d

other info that might help us debug this:

1 lock held by iwconfig/9878:
 #0:  (rtnl_mutex){--..}, at: [<c0336590>] mutex_lock+0x8/0xa

stack backtrace:
 [<c01037d8>] show_trace_log_lvl+0x58/0x107
 [<c010389d>] show_trace+0x16/0x18
 [<c0103963>] dump_stack+0x19/0x1b
 [<c012f294>] print_circular_bug_tail+0x58/0x61
 [<c012f7ac>] check_prev_add+0x30/0x1b6
 [<c012f979>] check_prevs_add+0x47/0xac
 [<c0130dcf>] __lock_acquire+0x722/0x781
 [<c0131196>] lock_acquire+0x49/0x66
 [<c033664f>] __mutex_lock_slowpath+0xb3/0x226
 [<c0336590>] mutex_lock+0x8/0xa
 [<f8b92716>] ipw_wx_get_name+0x41/0xd4 [ipw3945]
 [<c02ee46b>] ioctl_standard_call+0x52/0x245
 [<c02ee6b6>] wireless_process_ioctl+0x58/0x320
 [<c02e545a>] dev_ioctl+0x35a/0x38d
[<c02dae8e>] sock_ioctl+0x1bc/0x1ca
 [<c01689ea>] do_ioctl+0x26/0x68
 [<c0168cbf>] vfs_ioctl+0x18a/0x19b
 [<c0168d16>] sys_ioctl+0x46/0x64
 [<c0102809>] sysenter_past_esp+0x56/0x8d
 [<b7f33410>] 0xb7f33410
------- Comment #1 From 2006-07-06 17:31:34 -------
Bumping to p1/normal to investigate.
------- Comment #2 From 2006-07-06 20:09:08 -------
Created an attachment (id=848) [details]
patch to try
------- Comment #3 From 2006-07-06 20:10:34 -------
The priv->mutex --> rtnl_mutex sequence
(ipw_bg_calibrated_work -> register_netdev) only happened at initialization when
registering the net_device.
And only after that time the ipw_is_ready will be true.

ipw_wx_get_name will test ipw_is_ready before trying to acquire the priv->mutex
(it already holds the rtnl_mutex). So the deadlock will not appear.

But scan through the code, some ipw_wx_XXX doesn't do this test.
I've attached a patch to test ipw_is_ready before acquiring the priv->muxtex in
all wireless ioctl handlers (ipw_wx_XXX) to avoid such kind of deadlock.
------- Comment #4 From 2006-07-10 09:48:40 -------
Hmm... Yi Zhu has following different take on the problem -
----------------
From:  	Zhu Yi <yi.zhu@intel.com> 
Yup. It's a possible deadlock.

1. dev_ioctl() calls wireless_process_ioctl() with the rtnl mutex held.
Most ipw ioctl callbacks request priv->mutex.

2. ipw_bg_calibrated_work() holds priv->mutex first. Then it calls
register_netdev() if the card is not registered to netdev.
register_netdev() request the rtnl metux.

The above two scenarios will cause deadlock.
---------------

Yi has suggested a patch to try - I'll attach it here for further discussion and
give it a try. 
------- Comment #5 From 2006-07-10 09:50:18 -------
Created an attachment (id=852) [details]
From:	Zhu Yi <yi.zhu@intel.com> - patch to fix potential circular dep
------- Comment #6 From 2006-07-10 16:20:17 -------
Could the bug be related to what I am seeing under 2.6.17-1.2145_FC5smp?

Jul  9 11:32:26 tako kernel: BUG: soft lockup detected on CPU#0!
Jul  9 11:32:26 tako kernel:  <c044a9b6> softlockup_tick+0xad/0xc4  <c042d874>
update_process_times+0x39/0x5c
Jul  9 11:32:26 tako kernel:  <c0418af3> smp_apic_timer_interrupt+0x5a/0x63 
<c040490f> apic_timer_interrupt+0x1f/0x24
Jul  9 11:32:26 tako kernel:  <c0411f23> delay_pmtmr+0xb/0x13  <c04e9999>
__delay+0x9/0xa
Jul  9 11:32:26 tako kernel:  <f8a474d4> ipw_bg_alive_start+0xc0/0x1a1 [ipw3945]
 <c043317e> run_workqueue+0x86/0xc6
Jul  9 11:32:26 tako kernel:  <c0436650> remove_wait_queue+0xaf/0xb9  <f8a47414>
ipw_bg_alive_start+0x0/0x1a1 [ipw3945]
Jul  9 11:32:26 tako kernel:  <c0433ade> worker_thread+0x0/0x106  <c0433bb3>
worker_thread+0xd5/0x106
Jul  9 11:32:26 tako kernel:  <c041f2ff> default_wake_function+0x0/0xc 
<c04362e3> kthread+0x9d/0xc9
Jul  9 11:32:26 tako kernel:  <c0436246> kthread+0x0/0xc9  <c0402005>
kernel_thread_helper+0x5/0xb
------- Comment #7 From 2006-07-10 20:32:10 -------
(In reply to comment #6)
> Jul  9 11:32:26 tako kernel: BUG: soft lockup detected on CPU#0!
> Jul  9 11:32:26 tako kernel:  <c044a9b6> softlockup_tick+0xad/0xc4  <c042d874>
....
> Jul  9 11:32:26 tako kernel:  <c0411f23> delay_pmtmr+0xb/0x13  <c04e9999>
> __delay+0x9/0xa
> Jul  9 11:32:26 tako kernel:  <f8a474d4> ipw_bg_alive_start+0xc0/0x1a1 [ipw3945]

The only possible problematic area is we use udelay+busy loop , waiting for the
thermal sensor to kick in.
Please use debug=0x4 when loading the driver, and attach the debug log.
------- Comment #8 From 2006-07-13 02:24:53 -------
According to comment #7, mark it as NEEDMOREDATA.
------- Comment #9 From 2006-07-14 01:54:35 -------
Created an attachment (id=859) [details]
patch

patch combining the two previous patches
------- Comment #10 From 2006-07-14 02:08:32 -------
set as TESTED_PATCH_EXISTS for the lock dependency problem

added a new bug #1096 for the soft lockup problem

Thanks,
Hong
------- Comment #11 From 2006-10-12 19:34:05 -------
*** Bug 1138 has been marked as a duplicate of this bug. ***
------- Comment #12 From 2006-11-01 12:46:53 -------
Merged for 1.1.2
------- Comment #13 From 2007-02-04 19:09:39 -------
Parag, can you help to verify this bug against the latest ipw3945 release?
------- Comment #14 From 2007-06-08 07:56:14 -------
Should not have this issue in 1.2.1. Verified it.

--Songbo