Futex Spinning on CPU
Lately we had issue with SQL running for long and after investigating found that it was just hanging/spinning on CPU. Even there was no system stats that were increasing
SID USERNAME STATE EVENT SEQ# SEC_IN_STATE SQL_ID CHN P1 P2 P3
------- ---------- ------- ---------------------------------------- ---------- ------------ ------------- ---- ------------------------- ------------------------- -----------------------
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12284 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12287 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12288 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12289 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12290 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12291 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12291 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12292 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12293 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433
----------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
-1, , ENQG, AE - Edition Lock , 3, .3,
-1, , ENQG, CF - Controlfile Transaction , 20, 2,
-1, , ENQG, CO - KTUCLO Master Slave enq , 3, .3,
-1, , ENQG, DD - ASM Local Disk Group , 4, .4,
-1, , ENQG, JD - Job Queue Date , 2, .2,
-1, , ENQG, JS - Job Scheduler , 316, 31.6,
-1, , ENQG, MR - Media Recovery , 4, .4,
-1, , ENQG, PS - PX Process Reservation , 9, .9,
-1, , ENQG, SE - Session Migration , 3, .3,
-1, , ENQG, TM - DML , 51, 5.1,
-1, , ENQG, TX - Transaction , 13, 1.3,
-1, , ENQG, XR - Quiesce / Force Logging , 3, .3,
-1, , LATG, post/wait queue , 27, 2.7,
-1, , LATG, test excl. parent l0 , 2, .2,
-1, , LATG, test excl. parent2 l0 , 2, .2,
-1, , LATG, process allocation , 59, 5.9,
-1, , LATG, session allocation , 78, 7.8,
-1, , LATG, session switching , 3, .3,
-1, , LATG, process group creation , 9, .9,
-1, , LATG, session idle bit , 141, 14.1,
-1, , LATG, client/application info , 44, 4.4,
-1, , LATG, longop free list parent , 2, .2,
-1, , LATG, ksuosstats global area , 1, .1,
-1, , LATG, ksim membership request latch , 6, .6,
-1, , LATG, kss move lock , 2, .2,
-1, , LATG, parameter table management , 15, 1.5,
-1, , LATG, messages , 481, 48.1,
-1, , LATG, enqueues , 713, 71.3,
-1, , LATG, enqueue hash chains , 986, 98.6,
-1, , LATG, FOB s.o list latch , 8, .8,
-1, , LATG, ksfv messages , 2, .2,
-1, , LATG, msg queue latch , 2, .2,
-1, , LATG, done queue latch , 2, .2,
-1, , LATG, session queue latch , 2, .2,
-1, , LATG, resmgr:free threads list , 15, 1.5,
-1, , LATG, resmgr group change latch , 8, .8,
-1, , LATG, channel handle pool latch , 11, 1.1,
-1, , LATG, channel operations parent latch , 1493, 149.3,
-1, , LATG, message pool operations parent latch , 30, 3,
-1, , LATG, ksv class latch , 14, 1.4,
-1, , LATG, ksv msg queue latch , 2, .2,
-1, , LATG, ksv allocation latch , 1, .1,
-1, , LATG, second spare latch , 2, .2,
-1, , LATG, third spare latch , 2, .2,
-1, , LATG, fifth spare latch , 2, .2,
-1, , LATG, ksxp shared latch , 9, .9,
-1, , LATG, IPC stats buffer allocation latch , 59, 5.9,
-1, , LATG, granule operation , 2, .2,
-1, , LATG, active service list , 104, 10.4,
-1, , LATG, OS process allocation , 13, 1.3,
-1, , LATG, OS process , 20, 2,
-1, , LATG, OS process: request allocation , 9, .9,
-1, , LATG, ksz_so allocation latch , 9, .9,
-1, , LATG, ges process table freelist , 9, .9,
-1, , LATG, ges process parent latch , 787, 78.7,
-1, , LATG, ges process hash list , 1847, 184.7,
-1, , LATG, ges resource table freelist , 43, 4.3,
-1, , LATG, ges caches resource lists , 260, 26,
-1, , LATG, ges resource hash list , 644, 64.4,
-1, , LATG, ges resource scan list , 2, .2,
-1, , LATG, ges enqueue table freelist , 393, 39.3,
-1, , LATG, ges timeout list , 4, .4,
-1, , LATG, KJC message pool free list , 24, 2.4,
-1, , LATG, KJCT flow control latch , 362, 36.2,
-1, , LATG, ges domain table , 379, 37.9,
-1, , LATG, ges group table , 571, 57.1,
-1, , LATG, gcs resource hash , 246, 24.6,
-1, , LATG, gcs opaque info freelist , 44, 4.4,
-1, , LATG, gcs resource freelist , 20, 2,
-1, , LATG, gcs resource scan list , 2, .2,
-1, , LATG, gcs shadows freelist , 8, .8,
-1, , LATG, gcs commit scn state , 2, .2,
-1, , LATG, name-service request queue , 104, 10.4,
-1, , LATG, name-service pending queue , 4, .4,
-1, , LATG, name-service namespace bucket , 63, 6.3,
-1, , LATG, name-service memory objects , 20, 2,
-1, , LATG, gcs partitioned table hash , 24776, 2.48k,
-1, , LATG, gcs pcm hashed value bucket hash , 2, .2,
-1, , LATG, recovery domain hash list , 2, .2,
-1, , LATG, ges value block free list , 2, .2,
-1, , LATG, Testing , 2, .2,
-1, , LATG, heartbeat check , 5, .5,
-1, , LATG, KMG MMAN ready and startup request latch, 4, .4,
-1, , LATG, Memory Management Latch , 6, .6,
-1, , LATG, mostly latch-free SCN , 20, 2,
-1, , LATG, lgwr LWN SCN , 20, 2,
-1, , LATG, Consistent RBA , 19, 1.9,
-1, , LATG, cache buffers lru chain , 688, 68.8,
-1, , LATG, buffer pool , 2, .2,
-1, , LATG, active checkpoint queue latch , 24, 2.4,
-1, , LATG, checkpoint queue latch , 1202, 120.2,
-1, , LATG, cache buffers chains , 2012, 201.2,
-1, , LATG, cache buffer handles , 8, .8,
-1, , LATG, multiblock read objects , 54, 5.4,
-1, , LATG, tablespace key chain , 11, 1.1,
-1, , LATG, simulator lru latch , 29, 2.9,
-1, , LATG, simulator hash latch , 29, 2.9,
-1, , LATG, sim partition latch , 2, .2,
-1, , LATG, object queue header operation , 1998, 199.8,
-1, , LATG, object queue header heap , 291, 29.1,
-1, , LATG, archive control , 1, .1,
-1, , LATG, archive process latch , 3, .3,
-1, , LATG, redo writing , 87, 8.7,
-1, , LATG, redo copy , 149, 14.9,
-1, , LATG, redo allocation , 208, 20.8,
-1, , LATG, gc element , 582, 58.2,
-1, , LATG, logminer work area , 2, .2,
-1, , LATG, logminer context allocation , 2, .2,
-1, , LATG, logical standby cache , 2, .2,
-1, , LATG, mapped buffers lru chain , 2, .2,
-1, , LATG, lock DBA buffer during media recovery , 2, .2,
-1, , LATG, flashback copy , 2, .2,
-1, , LATG, cvmap freelist lock , 2, .2,
-1, , LATG, io pool granule metadata list , 2, .2,
-1, , LATG, dml lock allocation , 1, .1,
-1, , LATG, DML lock allocation , 104, 10.4,
-1, , LATG, list of block allocation , 9, .9,
-1, , LATG, transaction allocation , 11, 1.1,
-1, , LATG, dummy allocation , 17, 1.7,
-1, , LATG, transaction branch allocation , 8, .8,
-1, , LATG, commit callback allocation , 4, .4,
-1, , LATG, undo global data , 46, 4.6,
-1, , LATG, corrupted undo seg lock , 3, .3,
-1, , LATG, MinActiveScn Latch , 127, 12.7,
-1, , LATG, intra txn parallel recovery , 2, .2,
-1, , LATG, In memory undo latch , 2, .2,
-1, , LATG, KTF sga latch , 2, .2,
-1, , LATG, Change Notification Hash table latch , 3, .3,
-1, , LATG, change notification client cache latch , 2, .2,
-1, , LATG, sort extent pool , 1, .1,
-1, , LATG, lob segment hash table latch , 2, .2,
-1, , LATG, lob segment query latch , 2, .2,
-1, , LATG, lob segment dispenser latch , 2, .2,
-1, , LATG, space background task latch , 13, 1.3,
-1, , LATG, sequence cache , 9, .9,
-1, , LATG, File State Object Pool Parent Latch , 2, .2,
-1, , LATG, Write State Object Pool Parent Latch , 2, .2,
-1, , LATG, deferred cleanup latch , 1, .1,
-1, , LATG, kdlx hb parent latch , 2, .2,
-1, , LATG, Locator state objects pool parent latch , 2, .2,
-1, , LATG, row cache objects , 435, 43.5,
-1, , LATG, kks stats , 2, .2,
-1, , LATG, global KZLD latch for mem in SGA , 3, .3,
-1, , LATG, user lock , 18, 1.8,
-1, , LATG, global ctx hash table latch , 13, 1.3,
-1, , LATG, global tx hash mapping , 2, .2,
-1, , LATG, k2q lock allocation , 2, .2,
-1, , LATG, k2q global data latch , 6, .6,
-1, , LATG, shared pool , 455, 45.5,
-1, , LATG, Token Manager , 2, .2,
-1, , LATG, cas latch , 2, .2,
-1, , LATG, rm cas latch , 2, .2,
-1, , LATG, resmgr:actses change state , 2, .2,
-1, , LATG, resmgr:actses change group , 8, .8,
-1, , LATG, resmgr:session queuing , 2, .2,
-1, , LATG, resmgr:active threads , 15, 1.5,
-1, , LATG, resmgr:plan CPU method , 2, .2,
-1, , LATG, resmgr:resource group CPU method , 2, .2,
-1, , LATG, QMT , 2, .2,
-1, , LATG, shared pool simulator , 2, .2,
-1, , LATG, shared pool sim alloc , 2, .2,
-1, , LATG, Streams Generic , 2, .2,
-1, , LATG, Shared B-Tree , 5, .5,
-1, , LATG, Memory Queue , 2, .2,
-1, , LATG, Memory Queue Subscriber , 2, .2,
-1, , LATG, Memory Queue Message Subscriber #1 , 2, .2,
-1, , LATG, Memory Queue Message Subscriber #2 , 2, .2,
-1, , LATG, Memory Queue Message Subscriber #3 , 2, .2,
-1, , LATG, Memory Queue Message Subscriber #4 , 2, .2,
-1, , LATG, pesom_hash_node , 2, .2,
-1, , LATG, pesom_free_list , 2, .2,
-1, , LATG, peshm , 2, .2,
-1, , LATG, Mutex , 2, .2,
-1, , LATG, Mutex Stats , 2, .2,
-1, , LATG, KGNFS-NFS:SHM structure , 2, .2,
-1, , LATG, KGNFS-NFS:SVR LIST , 2, .2,
-1, , LATG, SGA heap locks , 2, .2,
-1, , LATG, SGA pool locks , 2, .2,
-1, , LATG, SGA bucket locks , 2, .2,
-1, , LATG, SGA blob parent , 2, .2,
-1, , LATG, kgb parent , 2, .2,
-1, , LATG, Event Group Locks , 5, .5,
-1, , LATG, session timer , 8, .8,
-1, , LATG, parameter list , 97, 9.7,
-1, , LATG, virtual circuit buffers , 2, .2,
-1, , LATG, virtual circuit queues , 2, .2,
-1, , LATG, virtual circuit holder , 2, .2,
-1, , LATG, cp server hash latch , 2, .2,
-1, , LATG, cp pool latch , 2, .2,
-1, , LATG, cp cmon/server latch , 2, .2,
-1, , LATG, cp sga latch , 1, .1,
-1, , LATG, job_queue_processes parameter latch , 2, .2,
-1, , LATG, job workq parent latch , 2, .2,
-1, , LATG, query server freelists , 15, 1.5,
-1, , LATG, error message lists , 5, .5,
-1, , LATG, process queue , 19, 1.9,
-1, , LATG, process queue reference , 568, 56.8,
-1, , LATG, parallel query stats , 8, .8,
-1, , LATG, business card , 4, .4,
-1, , LATG, parallel query alloc buffer , 38, 3.8,
-1, , LATG, SQL memory manager latch , 3, .3,
-1, , LATG, SQL memory manager workarea list latch , 215, 21.5,
-1, , LATG, compile environment latch , 6, .6,
-1, , LATG, kokc descriptor allocation latch , 8, .8,
-1, , LATG, ncodef allocation latch , 1, .1,
-1, , LATG, session state list latch , 6, .6,
-1, , LATG, KWQS pqueue ctx latch , 1, .1,
-1, , LATG, KWQS pqsubs latch , 1, .1,
-1, , LATG, kwqbsn:qsga , 1, .1,
-1, , LATG, qmn task queue latch , 4, .4,
-1, , LATG, KWQMN job cache list latch , 3, .3,
-1, , LATG, XDB unused session pool , 2, .2,
-1, , LATG, XDB used session pool , 2, .2,
-1, , LATG, XDB NFS Security Latch , 2, .2,
-1, , LATG, KFC Hash Latch , 2, .2,
-1, , LATG, KFC FX Hash Latch , 2, .2,
-1, , LATG, ASM map operation hash table , 186, 18.6,
-1, , LATG, ASM network background latch , 4, .4,
-1, , LATG, ASM db client latch , 6, .6,
-1, , LATG, KFR redo allocation latch , 2, .2,
-1, , LATG, KFCL LE Freelist , 2, .2,
-1, , LATG, JS broadcast add buf latch , 2, .2,
-1, , LATG, JS broadcast drop buf latch , 2, .2,
-1, , LATG, JS broadcast load blnc latch , 2, .2,
-1, , LATG, JS slv state obj latch , 2, .2,
-1, , LATG, JS queue state obj latch , 312, 31.2,
-1, , LATG, JS queue access latch , 2, .2,
-1, , LATG, JS Sh mem access , 4, .4,
-1, , LATG, PL/SQL warning settings , 9, .9,
-- End of snap 1, end=2012-04-02 21:37:13, seconds=10
CH# PARENT_HANDLE OBJECT_HANDLE PLAN_HASH PARSES H_PARSES EXECUTIONS FETCHES ROWS_PROCESSED CPU_MS ELA_MS LIOS PIOS SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- ---------------
0 000000041CE5D250 000000041CE5D100 1758161767 1 1 0 0 0 154999.436 297611.236 8382945 707682 1 1
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
<-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508
futex(0x3596554594, FUTEX_WAIT_PRIVATE, 2, NULL
#0 0x00000035962e027e in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x000000359628d56d in _L_lock_1685 () from /lib64/libc.so.6
#2 0x000000359628d2b7 in __tz_convert () from /lib64/libc.so.6
#3 0x0000000001db392a in sldxgd ()
#4 0x000000000339ddfd in nldatxtmil ()
#5 0x000000000339dd20 in nldatxt ()
#6 0x00000000076c6135 in nstimexp ()
#7 0x00000000064c1bca in ltmdvp ()
#8 0x00000000064c1ae3 in ltmdrv ()
#9 0x00000000064a6fc0 in sltmdf ()
#10 0x0000000006496325 in sslsstehdlr ()
#11 0x00000000064957e9 in sslsshandler ()
#12 <signal handler called>
#13 0x000000359628d88f in __tzfile_compute () from /lib64/libc.so.6
#14 0x000000359628d2ef in __tz_convert () from /lib64/libc.so.6
#15 0x0000000007ccdbb2 in sLdiGetDate ()
#16 0x00000000019104b7 in psdsts ()
#17 0x0000000002b0ffe4 in pessts ()
#18 0x0000000007dd6fae in pevm_icd_call_common ()
#19 0x0000000007dce9c3 in pfrinstr_BCAL ()
#20 0x0000000007dcdc3f in pfrrun_no_tool ()
#21 0x0000000002a276e7 in pfrrun ()
#22 0x0000000002a38c7d in plsql_run ()
#23 0x0000000002a1ebc4 in peicnt ()
#24 0x0000000002a1ea0d in peiet_execute_trigger ()
#25 0x00000000023f8080 in kkxtexe ()
#26 0x00000000019437dc in kxtExecuteTriggerRecursive ()
#27 0x0000000007c1267a in rpiswu2 ()
#28 0x0000000001942f66 in kxtExecuteTriggerReal ()
#29 0x0000000001942d3f in kxtexe ()
#30 0x0000000001cc468e in qesltcExecuteBeforeRowTriggers ()
#31 0x0000000007cacc8b in qesltcBeforeRowProcessing ()
#32 0x0000000007c8c9a3 in __PGOSF606_qerltcKdtBufferedInsRowCBK ()
#33 0x0000000007c8c47e in qerltcLoadStateMachine ()
#34 0x0000000007c8bf9d in __PGOSF596_qerltcInsertSelectRop ()
#35 0x0000000001c8d868 in qerghFetch ()
#36 0x0000000007afa0a9 in rwsfcd ()
#37 0x0000000007c8a447 in qerltcFetch ()
#38 0x0000000007bd5a1d in insexe ()
#39 0x0000000007c9bfe7 in opiexe ()
#40 0x0000000001cd1dc5 in kpoal8 ()
#41 0x0000000007b4e0d4 in opiodr ()
#42 0x0000000007d0ea8b in ttcpip ()
#43 0x000000000106800d in opitsk ()
#44 0x000000000106aa2a in opiino ()
#45 0x0000000007b4e0d4 in opiodr ()
#46 0x0000000001063d6c in opidrv ()
#47 0x000000000177d4c2 in sou2o ()
#48 0x0000000000975603 in opimai_real ()
#49 0x0000000001782a79 in ssthrdmain ()
#50 0x000000000097552f in main ()
This is from metalink.
Bug 6918493 – Net DCD (sqlnet.expire_time>0) can cause OS level mutex hang, possible to get PMON failed to acquired latch
Using Net dead connection detection (DCD) by setting SQLNET.EXPIRE_TIME > 0 in the SQLNET.ORA file can cause an OS level mutex hang under nsexptim -> … -> localtime_r .. and possibly under other substacks under nsexptim . The hang is very intermittent in nature as it depends on a specific timing scenario.
If the hung process is holding a critical resource, such as a latch, then this can lead to a wider hang scenario.
Rediscovery Notes:
Dead Connection Detection is enabled (SQLNET.EXPIRE_TIME > 0 )
The stack includes nsexptim and is likely to include
localtime_tz_r or localtime_r
The top of the stack shows the session blocked on an
OS mutex. (eg: In __lll_mutex_lock_wait on Linux, or in a
futex call)
If the hung process holds a latch, or other critical
resource, then PMON may give a clue in its tracefile.
Workaround:
Disable DCD and configure OS level keepalive instead.
eg: Set SQLNET.EXPIRE_TIME=0 in the SQLNET.ORA file
and configure TCP keepalive to the required timeouts
at OS level (exact parameters and steps vary between
platforms as TCP keepalive is an OS level configuration)