Bugzilla – Bug 1089
Possible circular locking dependency
Last modified: 2007-06-08 07:56:14
You need to log in before you can comment on or make changes to this bug.
The lock dependency validator in 2.6.18-rc1 detects the following circular locking dependency - at this point I am not sure if the validator is correct or just confused. This same error was sent by email to James and SF mailing list. Logging it here so it doesn't get lost. Thanks. ======================================================= [ INFO: possible circular locking dependency detected ] ------------------------------------------------------- iwconfig/9878 is trying to acquire lock: (&priv->mutex){--..}, at: [<c0336590>] mutex_lock+0x8/0xa but task is already holding lock: (rtnl_mutex){--..}, at: [<c0336590>] mutex_lock+0x8/0xa which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){--..}: [<c012f979>] check_prevs_add+0x47/0xac [<c0130dcf>] __lock_acquire+0x722/0x781 [<c0131196>] lock_acquire+0x49/0x66 [<c033664f>] __mutex_lock_slowpath+0xb3/0x226 [<c0336590>] mutex_lock+0x8/0xa [<c02eb2fd>] rtnl_lock+0xd/0xf [<c02e5787>] register_netdev+0xd/0x77 [<f8b87140>] ipw_bg_calibrated_work+0x69/0x1ac [ipw3945] [<c01286e6>] run_workqueue+0x8b/0xcb [<c012882e>] worker_thread+0x108/0x13c [<c012b43e>] kthread+0x84/0xad [<c0100ec5>] kernel_thread_helper+0x5/0xb -> #0 (&priv->mutex){--..}: [<c012f979>] check_prevs_add+0x47/0xac [<c0130dcf>] __lock_acquire+0x722/0x781 [<c0131196>] lock_acquire+0x49/0x66 [<c033664f>] __mutex_lock_slowpath+0xb3/0x226 [<c0336590>] mutex_lock+0x8/0xa [<f8b92716>] ipw_wx_get_name+0x41/0xd4 [ipw3945] [<c02ee46b>] ioctl_standard_call+0x52/0x245 [<c02ee6b6>] wireless_process_ioctl+0x58/0x320 [<c02e545a>] dev_ioctl+0x35a/0x38d [<c02dae8e>] sock_ioctl+0x1bc/0x1ca [<c01689ea>] do_ioctl+0x26/0x68 [<c0168cbf>] vfs_ioctl+0x18a/0x19b [<c0168d16>] sys_ioctl+0x46/0x64 [<c0102809>] sysenter_past_esp+0x56/0x8d other info that might help us debug this: 1 lock held by iwconfig/9878: #0: (rtnl_mutex){--..}, at: [<c0336590>] mutex_lock+0x8/0xa stack backtrace: [<c01037d8>] show_trace_log_lvl+0x58/0x107 [<c010389d>] show_trace+0x16/0x18 [<c0103963>] dump_stack+0x19/0x1b [<c012f294>] print_circular_bug_tail+0x58/0x61 [<c012f7ac>] check_prev_add+0x30/0x1b6 [<c012f979>] check_prevs_add+0x47/0xac [<c0130dcf>] __lock_acquire+0x722/0x781 [<c0131196>] lock_acquire+0x49/0x66 [<c033664f>] __mutex_lock_slowpath+0xb3/0x226 [<c0336590>] mutex_lock+0x8/0xa [<f8b92716>] ipw_wx_get_name+0x41/0xd4 [ipw3945] [<c02ee46b>] ioctl_standard_call+0x52/0x245 [<c02ee6b6>] wireless_process_ioctl+0x58/0x320 [<c02e545a>] dev_ioctl+0x35a/0x38d [<c02dae8e>] sock_ioctl+0x1bc/0x1ca [<c01689ea>] do_ioctl+0x26/0x68 [<c0168cbf>] vfs_ioctl+0x18a/0x19b [<c0168d16>] sys_ioctl+0x46/0x64 [<c0102809>] sysenter_past_esp+0x56/0x8d [<b7f33410>] 0xb7f33410
Bumping to p1/normal to investigate.
Created an attachment (id=848) [details] patch to try
The priv->mutex --> rtnl_mutex sequence (ipw_bg_calibrated_work -> register_netdev) only happened at initialization when registering the net_device. And only after that time the ipw_is_ready will be true. ipw_wx_get_name will test ipw_is_ready before trying to acquire the priv->mutex (it already holds the rtnl_mutex). So the deadlock will not appear. But scan through the code, some ipw_wx_XXX doesn't do this test. I've attached a patch to test ipw_is_ready before acquiring the priv->muxtex in all wireless ioctl handlers (ipw_wx_XXX) to avoid such kind of deadlock.
Hmm... Yi Zhu has following different take on the problem - ---------------- From: Zhu Yi <yi.zhu@intel.com> Yup. It's a possible deadlock. 1. dev_ioctl() calls wireless_process_ioctl() with the rtnl mutex held. Most ipw ioctl callbacks request priv->mutex. 2. ipw_bg_calibrated_work() holds priv->mutex first. Then it calls register_netdev() if the card is not registered to netdev. register_netdev() request the rtnl metux. The above two scenarios will cause deadlock. --------------- Yi has suggested a patch to try - I'll attach it here for further discussion and give it a try.
Created an attachment (id=852) [details] From: Zhu Yi <yi.zhu@intel.com> - patch to fix potential circular dep
Could the bug be related to what I am seeing under 2.6.17-1.2145_FC5smp? Jul 9 11:32:26 tako kernel: BUG: soft lockup detected on CPU#0! Jul 9 11:32:26 tako kernel: <c044a9b6> softlockup_tick+0xad/0xc4 <c042d874> update_process_times+0x39/0x5c Jul 9 11:32:26 tako kernel: <c0418af3> smp_apic_timer_interrupt+0x5a/0x63 <c040490f> apic_timer_interrupt+0x1f/0x24 Jul 9 11:32:26 tako kernel: <c0411f23> delay_pmtmr+0xb/0x13 <c04e9999> __delay+0x9/0xa Jul 9 11:32:26 tako kernel: <f8a474d4> ipw_bg_alive_start+0xc0/0x1a1 [ipw3945] <c043317e> run_workqueue+0x86/0xc6 Jul 9 11:32:26 tako kernel: <c0436650> remove_wait_queue+0xaf/0xb9 <f8a47414> ipw_bg_alive_start+0x0/0x1a1 [ipw3945] Jul 9 11:32:26 tako kernel: <c0433ade> worker_thread+0x0/0x106 <c0433bb3> worker_thread+0xd5/0x106 Jul 9 11:32:26 tako kernel: <c041f2ff> default_wake_function+0x0/0xc <c04362e3> kthread+0x9d/0xc9 Jul 9 11:32:26 tako kernel: <c0436246> kthread+0x0/0xc9 <c0402005> kernel_thread_helper+0x5/0xb
(In reply to comment #6) > Jul 9 11:32:26 tako kernel: BUG: soft lockup detected on CPU#0! > Jul 9 11:32:26 tako kernel: <c044a9b6> softlockup_tick+0xad/0xc4 <c042d874> .... > Jul 9 11:32:26 tako kernel: <c0411f23> delay_pmtmr+0xb/0x13 <c04e9999> > __delay+0x9/0xa > Jul 9 11:32:26 tako kernel: <f8a474d4> ipw_bg_alive_start+0xc0/0x1a1 [ipw3945] The only possible problematic area is we use udelay+busy loop , waiting for the thermal sensor to kick in. Please use debug=0x4 when loading the driver, and attach the debug log.
According to comment #7, mark it as NEEDMOREDATA.
Created an attachment (id=859) [details] patch patch combining the two previous patches
set as TESTED_PATCH_EXISTS for the lock dependency problem added a new bug #1096 for the soft lockup problem Thanks, Hong
*** Bug 1138 has been marked as a duplicate of this bug. ***
Merged for 1.1.2
Parag, can you help to verify this bug against the latest ipw3945 release?
Should not have this issue in 1.2.1. Verified it. --Songbo