Futex Spinning on CPU
April 9, 2012 Leave a comment
Lately we had issue with SQL running for long and after investigating found that it was just hanging/spinning on CPU. Even there was no system stats that were increasing
SID USERNAME STATE EVENT SEQ# SEC_IN_STATE SQL_ID CHN P1 P2 P3 ------- ---------- ------- ---------------------------------------- ---------- ------------ ------------- ---- ------------------------- ------------------------- ----------------------- 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12284 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12287 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12288 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12289 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12290 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12291 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12291 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12292 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 1281 XXXXXXXXXXX WORKING On CPU / runqueue 50039 12293 xxxxxxxxxxxxx 0 file#= 283 block#= 1702095 id#= 33554433 ---------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH ---------------------------------------------------------------------------------------------------------------------- -1, , ENQG, AE - Edition Lock , 3, .3, -1, , ENQG, CF - Controlfile Transaction , 20, 2, -1, , ENQG, CO - KTUCLO Master Slave enq , 3, .3, -1, , ENQG, DD - ASM Local Disk Group , 4, .4, -1, , ENQG, JD - Job Queue Date , 2, .2, -1, , ENQG, JS - Job Scheduler , 316, 31.6, -1, , ENQG, MR - Media Recovery , 4, .4, -1, , ENQG, PS - PX Process Reservation , 9, .9, -1, , ENQG, SE - Session Migration , 3, .3, -1, , ENQG, TM - DML , 51, 5.1, -1, , ENQG, TX - Transaction , 13, 1.3, -1, , ENQG, XR - Quiesce / Force Logging , 3, .3, -1, , LATG, post/wait queue , 27, 2.7, -1, , LATG, test excl. parent l0 , 2, .2, -1, , LATG, test excl. parent2 l0 , 2, .2, -1, , LATG, process allocation , 59, 5.9, -1, , LATG, session allocation , 78, 7.8, -1, , LATG, session switching , 3, .3, -1, , LATG, process group creation , 9, .9, -1, , LATG, session idle bit , 141, 14.1, -1, , LATG, client/application info , 44, 4.4, -1, , LATG, longop free list parent , 2, .2, -1, , LATG, ksuosstats global area , 1, .1, -1, , LATG, ksim membership request latch , 6, .6, -1, , LATG, kss move lock , 2, .2, -1, , LATG, parameter table management , 15, 1.5, -1, , LATG, messages , 481, 48.1, -1, , LATG, enqueues , 713, 71.3, -1, , LATG, enqueue hash chains , 986, 98.6, -1, , LATG, FOB s.o list latch , 8, .8, -1, , LATG, ksfv messages , 2, .2, -1, , LATG, msg queue latch , 2, .2, -1, , LATG, done queue latch , 2, .2, -1, , LATG, session queue latch , 2, .2, -1, , LATG, resmgr:free threads list , 15, 1.5, -1, , LATG, resmgr group change latch , 8, .8, -1, , LATG, channel handle pool latch , 11, 1.1, -1, , LATG, channel operations parent latch , 1493, 149.3, -1, , LATG, message pool operations parent latch , 30, 3, -1, , LATG, ksv class latch , 14, 1.4, -1, , LATG, ksv msg queue latch , 2, .2, -1, , LATG, ksv allocation latch , 1, .1, -1, , LATG, second spare latch , 2, .2, -1, , LATG, third spare latch , 2, .2, -1, , LATG, fifth spare latch , 2, .2, -1, , LATG, ksxp shared latch , 9, .9, -1, , LATG, IPC stats buffer allocation latch , 59, 5.9, -1, , LATG, granule operation , 2, .2, -1, , LATG, active service list , 104, 10.4, -1, , LATG, OS process allocation , 13, 1.3, -1, , LATG, OS process , 20, 2, -1, , LATG, OS process: request allocation , 9, .9, -1, , LATG, ksz_so allocation latch , 9, .9, -1, , LATG, ges process table freelist , 9, .9, -1, , LATG, ges process parent latch , 787, 78.7, -1, , LATG, ges process hash list , 1847, 184.7, -1, , LATG, ges resource table freelist , 43, 4.3, -1, , LATG, ges caches resource lists , 260, 26, -1, , LATG, ges resource hash list , 644, 64.4, -1, , LATG, ges resource scan list , 2, .2, -1, , LATG, ges enqueue table freelist , 393, 39.3, -1, , LATG, ges timeout list , 4, .4, -1, , LATG, KJC message pool free list , 24, 2.4, -1, , LATG, KJCT flow control latch , 362, 36.2, -1, , LATG, ges domain table , 379, 37.9, -1, , LATG, ges group table , 571, 57.1, -1, , LATG, gcs resource hash , 246, 24.6, -1, , LATG, gcs opaque info freelist , 44, 4.4, -1, , LATG, gcs resource freelist , 20, 2, -1, , LATG, gcs resource scan list , 2, .2, -1, , LATG, gcs shadows freelist , 8, .8, -1, , LATG, gcs commit scn state , 2, .2, -1, , LATG, name-service request queue , 104, 10.4, -1, , LATG, name-service pending queue , 4, .4, -1, , LATG, name-service namespace bucket , 63, 6.3, -1, , LATG, name-service memory objects , 20, 2, -1, , LATG, gcs partitioned table hash , 24776, 2.48k, -1, , LATG, gcs pcm hashed value bucket hash , 2, .2, -1, , LATG, recovery domain hash list , 2, .2, -1, , LATG, ges value block free list , 2, .2, -1, , LATG, Testing , 2, .2, -1, , LATG, heartbeat check , 5, .5, -1, , LATG, KMG MMAN ready and startup request latch, 4, .4, -1, , LATG, Memory Management Latch , 6, .6, -1, , LATG, mostly latch-free SCN , 20, 2, -1, , LATG, lgwr LWN SCN , 20, 2, -1, , LATG, Consistent RBA , 19, 1.9, -1, , LATG, cache buffers lru chain , 688, 68.8, -1, , LATG, buffer pool , 2, .2, -1, , LATG, active checkpoint queue latch , 24, 2.4, -1, , LATG, checkpoint queue latch , 1202, 120.2, -1, , LATG, cache buffers chains , 2012, 201.2, -1, , LATG, cache buffer handles , 8, .8, -1, , LATG, multiblock read objects , 54, 5.4, -1, , LATG, tablespace key chain , 11, 1.1, -1, , LATG, simulator lru latch , 29, 2.9, -1, , LATG, simulator hash latch , 29, 2.9, -1, , LATG, sim partition latch , 2, .2, -1, , LATG, object queue header operation , 1998, 199.8, -1, , LATG, object queue header heap , 291, 29.1, -1, , LATG, archive control , 1, .1, -1, , LATG, archive process latch , 3, .3, -1, , LATG, redo writing , 87, 8.7, -1, , LATG, redo copy , 149, 14.9, -1, , LATG, redo allocation , 208, 20.8, -1, , LATG, gc element , 582, 58.2, -1, , LATG, logminer work area , 2, .2, -1, , LATG, logminer context allocation , 2, .2, -1, , LATG, logical standby cache , 2, .2, -1, , LATG, mapped buffers lru chain , 2, .2, -1, , LATG, lock DBA buffer during media recovery , 2, .2, -1, , LATG, flashback copy , 2, .2, -1, , LATG, cvmap freelist lock , 2, .2, -1, , LATG, io pool granule metadata list , 2, .2, -1, , LATG, dml lock allocation , 1, .1, -1, , LATG, DML lock allocation , 104, 10.4, -1, , LATG, list of block allocation , 9, .9, -1, , LATG, transaction allocation , 11, 1.1, -1, , LATG, dummy allocation , 17, 1.7, -1, , LATG, transaction branch allocation , 8, .8, -1, , LATG, commit callback allocation , 4, .4, -1, , LATG, undo global data , 46, 4.6, -1, , LATG, corrupted undo seg lock , 3, .3, -1, , LATG, MinActiveScn Latch , 127, 12.7, -1, , LATG, intra txn parallel recovery , 2, .2, -1, , LATG, In memory undo latch , 2, .2, -1, , LATG, KTF sga latch , 2, .2, -1, , LATG, Change Notification Hash table latch , 3, .3, -1, , LATG, change notification client cache latch , 2, .2, -1, , LATG, sort extent pool , 1, .1, -1, , LATG, lob segment hash table latch , 2, .2, -1, , LATG, lob segment query latch , 2, .2, -1, , LATG, lob segment dispenser latch , 2, .2, -1, , LATG, space background task latch , 13, 1.3, -1, , LATG, sequence cache , 9, .9, -1, , LATG, File State Object Pool Parent Latch , 2, .2, -1, , LATG, Write State Object Pool Parent Latch , 2, .2, -1, , LATG, deferred cleanup latch , 1, .1, -1, , LATG, kdlx hb parent latch , 2, .2, -1, , LATG, Locator state objects pool parent latch , 2, .2, -1, , LATG, row cache objects , 435, 43.5, -1, , LATG, kks stats , 2, .2, -1, , LATG, global KZLD latch for mem in SGA , 3, .3, -1, , LATG, user lock , 18, 1.8, -1, , LATG, global ctx hash table latch , 13, 1.3, -1, , LATG, global tx hash mapping , 2, .2, -1, , LATG, k2q lock allocation , 2, .2, -1, , LATG, k2q global data latch , 6, .6, -1, , LATG, shared pool , 455, 45.5, -1, , LATG, Token Manager , 2, .2, -1, , LATG, cas latch , 2, .2, -1, , LATG, rm cas latch , 2, .2, -1, , LATG, resmgr:actses change state , 2, .2, -1, , LATG, resmgr:actses change group , 8, .8, -1, , LATG, resmgr:session queuing , 2, .2, -1, , LATG, resmgr:active threads , 15, 1.5, -1, , LATG, resmgr:plan CPU method , 2, .2, -1, , LATG, resmgr:resource group CPU method , 2, .2, -1, , LATG, QMT , 2, .2, -1, , LATG, shared pool simulator , 2, .2, -1, , LATG, shared pool sim alloc , 2, .2, -1, , LATG, Streams Generic , 2, .2, -1, , LATG, Shared B-Tree , 5, .5, -1, , LATG, Memory Queue , 2, .2, -1, , LATG, Memory Queue Subscriber , 2, .2, -1, , LATG, Memory Queue Message Subscriber #1 , 2, .2, -1, , LATG, Memory Queue Message Subscriber #2 , 2, .2, -1, , LATG, Memory Queue Message Subscriber #3 , 2, .2, -1, , LATG, Memory Queue Message Subscriber #4 , 2, .2, -1, , LATG, pesom_hash_node , 2, .2, -1, , LATG, pesom_free_list , 2, .2, -1, , LATG, peshm , 2, .2, -1, , LATG, Mutex , 2, .2, -1, , LATG, Mutex Stats , 2, .2, -1, , LATG, KGNFS-NFS:SHM structure , 2, .2, -1, , LATG, KGNFS-NFS:SVR LIST , 2, .2, -1, , LATG, SGA heap locks , 2, .2, -1, , LATG, SGA pool locks , 2, .2, -1, , LATG, SGA bucket locks , 2, .2, -1, , LATG, SGA blob parent , 2, .2, -1, , LATG, kgb parent , 2, .2, -1, , LATG, Event Group Locks , 5, .5, -1, , LATG, session timer , 8, .8, -1, , LATG, parameter list , 97, 9.7, -1, , LATG, virtual circuit buffers , 2, .2, -1, , LATG, virtual circuit queues , 2, .2, -1, , LATG, virtual circuit holder , 2, .2, -1, , LATG, cp server hash latch , 2, .2, -1, , LATG, cp pool latch , 2, .2, -1, , LATG, cp cmon/server latch , 2, .2, -1, , LATG, cp sga latch , 1, .1, -1, , LATG, job_queue_processes parameter latch , 2, .2, -1, , LATG, job workq parent latch , 2, .2, -1, , LATG, query server freelists , 15, 1.5, -1, , LATG, error message lists , 5, .5, -1, , LATG, process queue , 19, 1.9, -1, , LATG, process queue reference , 568, 56.8, -1, , LATG, parallel query stats , 8, .8, -1, , LATG, business card , 4, .4, -1, , LATG, parallel query alloc buffer , 38, 3.8, -1, , LATG, SQL memory manager latch , 3, .3, -1, , LATG, SQL memory manager workarea list latch , 215, 21.5, -1, , LATG, compile environment latch , 6, .6, -1, , LATG, kokc descriptor allocation latch , 8, .8, -1, , LATG, ncodef allocation latch , 1, .1, -1, , LATG, session state list latch , 6, .6, -1, , LATG, KWQS pqueue ctx latch , 1, .1, -1, , LATG, KWQS pqsubs latch , 1, .1, -1, , LATG, kwqbsn:qsga , 1, .1, -1, , LATG, qmn task queue latch , 4, .4, -1, , LATG, KWQMN job cache list latch , 3, .3, -1, , LATG, XDB unused session pool , 2, .2, -1, , LATG, XDB used session pool , 2, .2, -1, , LATG, XDB NFS Security Latch , 2, .2, -1, , LATG, KFC Hash Latch , 2, .2, -1, , LATG, KFC FX Hash Latch , 2, .2, -1, , LATG, ASM map operation hash table , 186, 18.6, -1, , LATG, ASM network background latch , 4, .4, -1, , LATG, ASM db client latch , 6, .6, -1, , LATG, KFR redo allocation latch , 2, .2, -1, , LATG, KFCL LE Freelist , 2, .2, -1, , LATG, JS broadcast add buf latch , 2, .2, -1, , LATG, JS broadcast drop buf latch , 2, .2, -1, , LATG, JS broadcast load blnc latch , 2, .2, -1, , LATG, JS slv state obj latch , 2, .2, -1, , LATG, JS queue state obj latch , 312, 31.2, -1, , LATG, JS queue access latch , 2, .2, -1, , LATG, JS Sh mem access , 4, .4, -1, , LATG, PL/SQL warning settings , 9, .9, -- End of snap 1, end=2012-04-02 21:37:13, seconds=10 CH# PARENT_HANDLE OBJECT_HANDLE PLAN_HASH PARSES H_PARSES EXECUTIONS FETCHES ROWS_PROCESSED CPU_MS ELA_MS LIOS PIOS SORTS USERS_EXECUTING ---- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- --------------- 0 000000041CE5D250 000000041CE5D100 1758161767 1 1 0 0 0 154999.436 297611.236 8382945 707682 1 1 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 <-ksedsts()+315<-ksdxfstk()+32<-ksdxcb()+1764<-sspuser()+112<-__restore_rt()<-__lll_lock_wait_private()+30<-_L_lock_1685()+19<-ksmskgmctx_()<-00007FFF36F46508 futex(0x3596554594, FUTEX_WAIT_PRIVATE, 2, NULL #0 0x00000035962e027e in __lll_lock_wait_private () from /lib64/libc.so.6 #1 0x000000359628d56d in _L_lock_1685 () from /lib64/libc.so.6 #2 0x000000359628d2b7 in __tz_convert () from /lib64/libc.so.6 #3 0x0000000001db392a in sldxgd () #4 0x000000000339ddfd in nldatxtmil () #5 0x000000000339dd20 in nldatxt () #6 0x00000000076c6135 in nstimexp () #7 0x00000000064c1bca in ltmdvp () #8 0x00000000064c1ae3 in ltmdrv () #9 0x00000000064a6fc0 in sltmdf () #10 0x0000000006496325 in sslsstehdlr () #11 0x00000000064957e9 in sslsshandler () #12 <signal handler called> #13 0x000000359628d88f in __tzfile_compute () from /lib64/libc.so.6 #14 0x000000359628d2ef in __tz_convert () from /lib64/libc.so.6 #15 0x0000000007ccdbb2 in sLdiGetDate () #16 0x00000000019104b7 in psdsts () #17 0x0000000002b0ffe4 in pessts () #18 0x0000000007dd6fae in pevm_icd_call_common () #19 0x0000000007dce9c3 in pfrinstr_BCAL () #20 0x0000000007dcdc3f in pfrrun_no_tool () #21 0x0000000002a276e7 in pfrrun () #22 0x0000000002a38c7d in plsql_run () #23 0x0000000002a1ebc4 in peicnt () #24 0x0000000002a1ea0d in peiet_execute_trigger () #25 0x00000000023f8080 in kkxtexe () #26 0x00000000019437dc in kxtExecuteTriggerRecursive () #27 0x0000000007c1267a in rpiswu2 () #28 0x0000000001942f66 in kxtExecuteTriggerReal () #29 0x0000000001942d3f in kxtexe () #30 0x0000000001cc468e in qesltcExecuteBeforeRowTriggers () #31 0x0000000007cacc8b in qesltcBeforeRowProcessing () #32 0x0000000007c8c9a3 in __PGOSF606_qerltcKdtBufferedInsRowCBK () #33 0x0000000007c8c47e in qerltcLoadStateMachine () #34 0x0000000007c8bf9d in __PGOSF596_qerltcInsertSelectRop () #35 0x0000000001c8d868 in qerghFetch () #36 0x0000000007afa0a9 in rwsfcd () #37 0x0000000007c8a447 in qerltcFetch () #38 0x0000000007bd5a1d in insexe () #39 0x0000000007c9bfe7 in opiexe () #40 0x0000000001cd1dc5 in kpoal8 () #41 0x0000000007b4e0d4 in opiodr () #42 0x0000000007d0ea8b in ttcpip () #43 0x000000000106800d in opitsk () #44 0x000000000106aa2a in opiino () #45 0x0000000007b4e0d4 in opiodr () #46 0x0000000001063d6c in opidrv () #47 0x000000000177d4c2 in sou2o () #48 0x0000000000975603 in opimai_real () #49 0x0000000001782a79 in ssthrdmain () #50 0x000000000097552f in main ()
This is from metalink.
Bug 6918493 – Net DCD (sqlnet.expire_time>0) can cause OS level mutex hang, possible to get PMON failed to acquired latch
Using Net dead connection detection (DCD) by setting SQLNET.EXPIRE_TIME > 0 in the SQLNET.ORA file can cause an OS level mutex hang under nsexptim -> … -> localtime_r .. and possibly under other substacks under nsexptim . The hang is very intermittent in nature as it depends on a specific timing scenario.
If the hung process is holding a critical resource, such as a latch, then this can lead to a wider hang scenario.
Rediscovery Notes:
Dead Connection Detection is enabled (SQLNET.EXPIRE_TIME > 0 )
The stack includes nsexptim and is likely to include
localtime_tz_r or localtime_r
The top of the stack shows the session blocked on an
OS mutex. (eg: In __lll_mutex_lock_wait on Linux, or in a
futex call)
If the hung process holds a latch, or other critical
resource, then PMON may give a clue in its tracefile.
Workaround:
Disable DCD and configure OS level keepalive instead.
eg: Set SQLNET.EXPIRE_TIME=0 in the SQLNET.ORA file
and configure TCP keepalive to the required timeouts
at OS level (exact parameters and steps vary between
platforms as TCP keepalive is an OS level configuration)