发表于: 2008.04.08 23:57
分类: ORACLE , Bug
出处: http://yangtingkun.itpub.net/post/468/459318
---------------------------------------------------------------
在Oracle Rac 10.2.0.3 for Solaris环境中,发现了这个7445错误。
后台alert文件中,错误信息如下:
Errors in file /oracle/app/admin/pre/udump/prerac1_ora_24271.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01013: user requested cancel of current operation
Thu Oct 11 23:54:59 2007
Errors in file /oracle/app/admin/pre/udump/prerac1_ora_24271.trc:
ORA-07445: exception encountered: core dump [kkttrex()+988] [SIGSEGV] [Address not mapped to object] [0x000000074] [] []
根据Metalink上查询结果,并未发现10.2上相关的错误信息。和这个7445错误相关的有两个已知bug,一个和逻辑STANDBY有关,这显然与当前不符。另一个和RAC中的登陆触发器有关,这一点倒是比较相符,不过这个bug在10.1.0.2中已经修正了,莫非这个问题在10.2.0.3中又重现引入了。关于这个bug的描述,可以参考metalink的文档:Doc ID: Note:3047933.8。
下面分析一下详细的trace文件:
*** 2007-10-11 23:54:59.767
Error in executing triggers on instance shutdown
*** 2007-10-11 23:54:59.802
ksedmp: internal or fatal error
ORA-00604: error occurred at recursive SQL level 1
ORA-01013: user requested cancel of current operation
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x74, PC: [0x101a4901c, kkttrex()+988]
*** 2007-10-11 23:54:59.815
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kkttrex()+988] [SIGSEGV] [Address not mapped to object] [0x000000074] [] []
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+744 CALL ksedst() 000000840 ? 1066DE17C ?
000000000 ? 1066DAC70 ?
1066D99D8 ? 1066DA3D8 ?
ssexhd()+1240 CALL ksedmp() 000106400 ? 10652D264 ?
10652D000 ? 00010652D ?
000106400 ? 10652D264 ?
__sighndlr()+12 PTR_CALL 0000000000000000 10652A000 ? 1066E1EF0 ?
10652722C ? 00010652A ?
00000000B ? 000000067 ?
call_user_handler() CALL __sighndlr() 00000000B ? 1066E1EF0 ?
+992 1066E1C10 ? 10033B1C0 ?
000000000 ? 00000000A ?
kkttrex()+924 CALL kglpin() FFFFFFFF7B502000 ?
FFFFFFFF7B502000 ?
1066E1C10 ? 000000009 ?
000000000 ? 000000000 ?
kktexeevt0()+856 CALL kkttrex() 106534000 ? 3CFF14A38 ?
106527000 ? 000106534 ?
000000000 ? 106534000 ?
从trace文件中的这部分内容已经可以看到,问题出现在关闭数据库时执行的触发器中,下面继续检查会话信息:
SO: 3d767a168, type: 4, owner: 3d74839f8, flag: INIT/-/-/0x00
(session) sid: 276 trans: 0, creator: 3d74839f8, flag: (c0000041) USR/- BSY/-/-/-/-/-
DID: 0001-0018-00006D54, short-term DID: 0001-0018-00006D55
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
O/S info: user: oracle, term: pts/1, ospid: 24270, machine: pre1
program: sqlplus@pre1 (TNS V1-V3)
application name: sqlplus@pre1 (TNS V1-V3), hash value=0
last wait for 'library cache pin' blocking sess=0x0 seq=215 wait_time=390 seconds since wait started=0
handle address=3d65969b0, pin address=3d0a4da78, 100*mode+namespace=cb
Dumping Session Wait History
for 'library cache pin' count=1 wait_time=390
handle address=3d65969b0, pin address=3d0a4da78, 100*mode+namespace=cb
for 'db file sequential read' count=1 wait_time=501
file#=1, block#=32e2, blocks=1
for 'db file sequential read' count=1 wait_time=7030
file#=1, block#=36ff, blocks=1
for 'db file sequential read' count=1 wait_time=1261
file#=1, block#=2a07, blocks=1
for 'db file sequential read' count=1 wait_time=535
file#=1, block#=4a8f, blocks=1
for 'db file sequential read' count=1 wait_time=7415
file#=1, block#=2b09, blocks=1
for 'rdbms ipc reply' count=1 wait_time=7026
from_process=11, timeout=147ae14, =0
for 'db file sequential read' count=1 wait_time=3225
file#=1, block#=738d, blocks=1
for 'row cache lock' count=1 wait_time=222
cache id=8, mode=0, request=3
for 'library cache pin' count=1 wait_time=521
handle address=3c1d2b748, pin address=3d069fcb8, 100*mode+namespace=cb
temporary object counter: 0
----------------------------------------
.
.
.
----------------------------------------
SO: 3d069e908, type: 53, owner: 3d767a168, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=3d069e908 handle=3c1d2b748 mode=N
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=3d069e988[3d0db4140,3d0db4140] htb=3d0db4140 ssga=3d0db3c48
user=3d767a168 session=3d767a168 count=0 flags=LRU/[4000] savepoint=0x29e
LIBRARY OBJECT HANDLE: handle=3c1d2b748 mtx=3c1d2b878(0) cdp=0
name=SYS.AURORA$SERVER$SHUTDOWN
hash=67d07c6755755ab3d0fc5cd9b49b003e timestamp=04-18-2007 17:58:17
namespace=TRGR flags=KGHP/TIM/XLR/[00000020]
kkkk-dddd-llll=0000-0000-0000 lock=N pin=0 latch#=3 hpc=0002 hlc=0002
lwt=3c1d2b7f0[3c1d2b7f0,3c1d2b7f0] ltm=3c1d2b800[3c1d2b800,3c1d2b800]
pwt=3c1d2b7b8[3c1d2b7b8,3c1d2b7b8] ptm=3c1d2b7c8[3c1d2b7c8,3c1d2b7c8]
ref=3c1d2b820[3c1d2b820,3c1d2b820] lnd=3c1d2b838[3c1d2b838,3c1d2b838]
LOCK INSTANCE LOCK: id=LD67d07c6755755ab3
PIN INSTANCE LOCK: id=ND67d07c6755755ab3 mode=S release=F flags=[00]
INVALIDATION INSTANCE LOCK: id=IV000090e712123b12 mode=S
LIBRARY OBJECT: object=3cfa25aa8
DATA BLOCKS:
data# heap pointer status pins change whr
----- -------- -------- --------- ---- ------ ---
0 3d64f7db8 3cfa25c40 I/-/A/-/- 0 NONE 00
----------------------------------------
.
.
.
----------------------------------------
SO: 3d07cc088, type: 53, owner: 3d0788d58, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=3d07cc088 handle=3d65969b0 mode=N
call pin=3d0a4da78 session pin=0 hpc=0000 hlc=0000
htl=3d07cc108[3d0db3fe0,3d0db3fe0] htb=3d0db3fe0 ssga=3d0db3c48
user=3d767a168 session=3d767a168 count=1 flags=CBB/PNC/[0600] savepoint=0x2d7
LIBRARY OBJECT HANDLE: handle=3d65969b0 mtx=3d6596ae0(0) cdp=0
name=SYS.OLAPISHUTDOWNTRIGGER
hash=7c8f4a27e2d4d40052c5cf8c7b67bb28
namespace=TRGR flags=KGHP/XLR/[00000020]
kkkk-dddd-llll=0000-0000-0000 lock=N pin=S latch#=3 hpc=0002 hlc=0002
lwt=3d6596a58[3d6596a58,3d6596a58] ltm=3d6596a68[3d6596a68,3d6596a68]
pwt=3d6596a20[3d6596a20,3d6596a20] ptm=3d6596a30[3d6596a30,3d6596a30]
ref=3d6596a88[3d6596a88,3d6596a88] lnd=3d6596aa0[3d6596aa0,3d6596aa0]
LOCK INSTANCE LOCK: id=LD7c8f4a27e2d4d400
PIN INSTANCE LOCK: id=ND7c8f4a27e2d4d400 mode=S release=F flags=[00]
LIBRARY OBJECT: object=3cff148e0
DATA BLOCKS:
data# heap pointer status pins change whr
----- -------- -------- --------- ---- ------ ---
0 3c1bee4d0 0 I/P/A/-/- 0 NONE 00
----------------------------------------
SO: 3d76b2fa8, type: 4, owner: 3d0788d58, flag: INIT/-/-/0x00
(session) sid: 320 trans: 0, creator: 0, flag: (2) -/REC -/-/-/-/-/-
DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
temporary object counter: 0
----------------------------------------
SO: 3d067d360, type: 50, owner: 3d0788d58, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3d767a168 object=3c1789620, mode=S
savepoint=0x29e
row cache parent object: address=3c1789620 cid=8(dc_objects)
hash=9344dbc9 typ=11 transaction=0 flags=00000002
own=3c17896f0[3d067d390,3d067d390] wat=3c1789700[3c1789700,3c1789700] mode=S
status=VALID/-/-/-/-/-/-/-/-
request=N release=FALSE flags=0
instance lock id=QI 4bbe8f92 abac9b13
set=0, complete=FALSE
data=
00000000 00164155 524f5241 24534552 56455224 53485554 444f574e 00000000
00000000 03000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 000090e7 ffff0000 ffffffff 0c786b04 1210351d
786b0412 123b1278 6b041212 3b120100 00000000 00000000 00000000 00000000
00000000 00000006
----------------------------------------
SO: 3d08ef9d8, type: 54, owner: 3d0788d58, flag: INIT/-/-/0x00
LIBRARY OBJECT PIN: pin=3d08ef9d8 handle=3d6681dc8 mode=S lock=3d0784a68
user=3d767a168 session=3d767a168 count=1 mask=0009 savepoint=0x29e flags=[00]
----------------------------------------
SO: 3d05f1ea8, type: 16, owner: 3d74839f8, flag: INIT/-/-/0x00
(osp req holder)
从trace的会话信息看,主会话是sys登陆的SQLPLUS,这是应该执行的是SHUTDOWN操作,随后trace中保留的两个部分,是系统总仅有的两个SHUTDOWN触发器,而TRACE信息就是在第二个SHUTDOWN触发器处结束的。
SQL> SELECT OWNER, TRIGGER_NAME, TRIGGER_TYPE, TRIGGERING_EVENT
2 FROM DBA_TRIGGERS
3 WHERE TRIGGERING_EVENT LIKE 'SHUT%';
OWNER TRIGGER_NAME TRIGGER_TYPE TRIGGERING_EVENT
---------- ------------------------------ ---------------- --------------------
SYS AURORA$SERVER$SHUTDOWN BEFORE EVENT SHUTDOWN
SYS OLAPISHUTDOWNTRIGGER BEFORE EVENT SHUTDOWN
看来这个问题确实和第二个bug比较类似,首先问题都是RAC环境下的问题,其次,问题都是触发器引起的,不过bug中描述的是登陆触发器,而现在碰到的问题是SHUTDOWN触发器。
不过问题和bug也有不一样之处,首先这个问题只出现过一次,而且在出现的时候系统本身的状态也是有问题的。
由于为RAC配置了ASM环境,且由于ASM本身设置的问题,导致ASM产生的进程数超过了ASM实例设置的最大进程数,从而导致归档甚至数据文件的读写都出现错误,导致后台出现大量的ORA-15055和ORA-00020错误。
也就是说从数据库关闭命令开始到ORA-7445错误的出现,到后面其他进程的关闭直到数据库重现启动,一直伴随这这些错误信息:
Thu Oct 11 23:47:36 2007
Shutting down instance: further logons disabled
EMN0 started with pid=44, OS id=24464
Thu Oct 11 23:47:36 2007
ksvsubmit: Process(O000) creation failed
Thu Oct 11 23:47:36 2007
ksvsubmit: Process(O000) creation failed
Thu Oct 11 23:47:36 2007
Stopping background process QMNC
Thu Oct 11 23:47:38 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc3_15833.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '+DATA/onlinelog/redo1_2_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_2_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 2 thread 1: '+DATA/onlinelog/redo1_2_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_2_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
Thu Oct 11 23:47:38 2007
.
.
.
Thu Oct 11 23:54:53 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc0_13665.trc:
ORA-00313: open failed for members of log group 4 of thread 1
ORA-00312: online log 4 thread 1: '+DATA/onlinelog/redo1_4_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_4_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
Thu Oct 11 23:54:55 2007
Archiver process freed from errors. No longer stopped
Thu Oct 11 23:54:55 2007
ARCH: Archival stopped, error occurred. Will continue retrying
Thu Oct 11 23:54:55 2007
ORACLE Instance prerac1 - Archival Error
Thu Oct 11 23:54:55 2007
ORA-16014: log 4 sequence# 52 not archived, no available destinations
ORA-00312: online log 4 thread 1: '+DATA/onlinelog/redo1_4_1.log'
ORA-00312: online log 4 thread 1: '+DATA/onlinelog/redo1_4_2.log'
Thu Oct 11 23:54:55 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc2_15831.trc:
ORA-16014: log 4 sequence# 52 not archived, no available destinations
ORA-00312: online log 4 thread 1: '+DATA/onlinelog/redo1_4_1.log'
ORA-00312: online log 4 thread 1: '+DATA/onlinelog/redo1_4_2.log'
Thu Oct 11 23:54:57 2007
Thread 1 advanced to log sequence 55
Current log# 2 seq# 55 mem# 0: +DATA/onlinelog/redo1_2_1.log
Current log# 2 seq# 55 mem# 1: +DATA/onlinelog/redo1_2_2.log
Thu Oct 11 23:54:57 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_j003_22073.trc:
ORA-12012: error on auto execute of job 300
ORA-04063: package body "GPO.PKG_BALANCE" has errors
ORA-06508: PL/SQL: could not find program unit being called: "GPO.PKG_BALANCE"
ORA-06512: at line 1
Thu Oct 11 23:54:58 2007
PMON deletion of process succeeded
Thu Oct 11 23:54:59 2007
Background process MMON not dead after 30 seconds
Thu Oct 11 23:54:59 2007
Killing background process MMON
Thu Oct 11 23:54:59 2007
Errors in file /oracle/app/admin/pre/udump/prerac1_ora_24271.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01013: user requested cancel of current operation
Thu Oct 11 23:54:59 2007
Errors in file /oracle/app/admin/pre/udump/prerac1_ora_24271.trc:
ORA-07445: exception encountered: core dump [kkttrex()+988] [SIGSEGV] [Address not mapped to object] [0x000000074] [] []
Thu Oct 11 23:55:00 2007
Stopping background process CJQ0
Thu Oct 11 23:55:01 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc3_15833.trc:
ORA-00313: open failed for members of log group 4 of thread 1
ORA-00312: online log 4 thread 1: '+DATA/onlinelog/redo1_4_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_4_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
Thu Oct 11 23:55:01 2007
.
.
.
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc0_13665.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '+DATA/onlinelog/redo1_3_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_3_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 3 thread 1: '+DATA/onlinelog/redo1_3_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_3_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
Fri Oct 12 00:03:39 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_j000_17244.trc:
ORA-12012: error on auto execute of job 295
ORA-02019: connection description for remote database not found
ORA-06512: at "GPO.P_SYNC_MBR", line 6
ORA-06512: at line 1
Fri Oct 12 00:05:56 2007
Archiver process freed from errors. No longer stopped
Fri Oct 12 00:07:07 2007
MMNL absent for 1201 secs; Foregrounds taking over
Fri Oct 12 00:08:13 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_asmb_11533.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Fri Oct 12 00:08:13 2007
ASMB: terminating instance due to error 15064
Fri Oct 12 00:08:13 2007
System state dump is made for local instance
System State dumped to trace file /oracle/app/admin/pre/bdump/prerac1_diag_11382.trc
从上面相对完整的alert信息中可以看到,shutdown过程中,一直伴随这ASM的错误,因此从这个角度上讲,这个ORA-7445错误的产生应该与ASM实例无法正常工作有必然的联系,而trace文件中最后的部分内容也证实了这一点:
3CD003790 4F52412D 30303032 303A206D 6178696D [ORA-00020: maxim]
3CD0037A0 756D206E 756D6265 72206F66 2070726F [um number of pro]
3CD0037B0 63657373 65732028 29206578 63656564 [cesses () exceed]
3CD0037C0 65640A00 00000000 00000000 00000000 [ed..............]
基于上面的这些推断,这个bug的产生应该是很偶然的,而且本身也不会造成什么危害。
关于ASM实例出现的ORA-15055错误和ORA-00020错误,可以参考下面的系列文章:
连接ASM实例后出现ORA-1012错误:连接ASM实例后出现ORA-1012错误
连接ASM实例后出现ORA-1012错误(二):连接ASM实例后出现ORA-1012错误(二)
连接ASM实例后出现ORA-1012错误(三)连接ASM实例后出现ORA-1012错误(三)











