发表于: 2008.06.04 23:26
分类: ORACLE , Bug
出处: http://yangtingkun.itpub.net/post/468/463724
---------------------------------------------------------------
其实这个错误真正对应的错误函数为kxhrPUcompare,ORA-7445错误诊断页面不认为_fini是一个正确的输入函数,但是由于以kxhrPUcompare开头的错误变化多样,基本上没有什么共性,而以_fini开头后面跟随kxhrPUcompare的错误还有一点规律,因此这里选择了_fini作为错误函数。
ORA-7445的kxhrPUcompare错误已经碰到过很多次了,但是没有什么规律可言,甚至连下面这种最简单的SQL都可能出现这个错误:
select a.*,b.name_chn as drug_name,b.code as drug_code from cat_drug_alias a, cat_drug b where a.medical_id = b.id
不过在分析了多个错误后,还是发现了kxhrPUcompare错误的一点共性,就是这个错误总是出现在HASH JOIN操作中。
上面的这个SQL就是一个典型,甚至不用看执行计划,就可以判断,这个SQL采用的是HASH JOIN。
而从错误后面紧跟的其他错误函数看,一般不是qerhjProbeHashTable就是qerhjWalkHashBucket,这说明了kxhrPUcompare错误是在进行HASH JOIN是出现的问题。由于错误很难重现,且每次错误的SQL语句没有什么共性,因此只能推测是9204版本上的一个与HASH JOIN有关的bug。
不过最近连续几次出现了一个相同的SQL,每次产生的错误都是以_fini开头,后面跟随kxhrPUcompare错误。个人怀疑和SQL本身的写法有关。
对应alert文件中的错误信息为:
Errors in file /opt/oracle/admin/data01/udump/data01_ora_12231.trc:
ORA-07445: exception encountered: core dump [00000001035DB088] [SIGILL] [Illegal opcode] [0x1035DB088] [] []
而对应的trace文件中详细错误如下:
*** 2008-06-02 09:24:24.360
*** SESSION ID:(90.15349) 2008-06-02 09:24:24.338
Exception signal: 4 (SIGILL), code: 1 (Illegal opcode), addr: 0x1035db088, PC: [0x1035db088, 00000001035DB088]
*** 2008-06-02 09:24:24.364
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [00000001035DB088] [SIGILL] [Illegal opcode] [0x1035DB088] [] []
Current SQL statement for this session:
select count(*) from (select nvl(deal.temp_flag,0) as temp_flag, b.url,c.member_flag,b.address, b.plat_id, b.used_name, c.code ,c.na
me ,b.check_flag, b.factory_flag, b.saler_flag, b.send_flag, c.abbr ,c.id, d.plat_name,d.plat_father from cat_enterprise b,cat_org c
, plt_plat d ,CAT_DEALER deal where b.id = c.id and b.plat_id = d.id and (c.name like '%天津%' or c.abbr
like '%天津%' or b.used_name like '%天津%' or (exists (select 1 from cat_manufactur
e cm, cat_org c where cm.molecule_orgid = c.id and cm.id=b.id and c.abbr || c.code|| c.name_wb|| c.spell_abbr || c.name like :1 )))
and c.ENABLE_FLAG = :2 and deal.id(+)=b.id )
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+328 CALL ksedst()+0 FFFFFFFF7FFF9580 ?
000000000 ? 000000000 ?
00000003E ?
FFFFFFFF7FFF9E18 ?
1031D6908 ?
ssexhd()+604 CALL ksedmp()+0 000000000 ? 000103400 ?
0001035DA ? 000102C00 ?
1035DA000 ? 1035DAE28 ?
sigacthandler()+44 PTR_CALL 0000000000000000 1035E2000 ?
FFFFFFFF7FFFADB0 ?
000000000 ? 000000001 ?
1035DFFD8 ? 000000004 ?
_fini()+10507384 PTR_CALL 0000000000000000 000000004 ?
FFFFFFFF7FFFADB0 ?
FFFFFFFF7FFFAAD0 ?
000000004 ?
3BE94AAB5D37D461 ?
79EBCAED4C37A3FA ?
kxhrPUcompare()+920 CALL ksxb1inc()+0 FFFFFFFF7FFFB170 ?
0FFFEFFF7 ? 000000000 ?
000000000 ? 000000028 ?
1035E2000 ?
qerhjProbeHashTable CALL kxhrPUcompare()+0 0FFFFFFF6 ?
()+204 FFFFFFFF7FFFB170 ?
40154A09ED0 ? 000010001 ?
000010001 ? 000000005 ?
qertbFetch()+1000 PTR_CALL 0000000000000000 FFFFFFFF7FFFBAA0 ?
000000000 ? 000000007 ?
000000080 ? 0001035E2 ?
000000001 ?
rwsfcd()+112 PTR_CALL 0000000000000000 400660F80DC ?
FFFFFFFF7FFFB4E0 ?
000000000 ? 000000001 ?
400ED20A588 ?
FFFFFFFF7CF765A8 ?
由于引发多次错误的SQL是相同的,且错误函数也完全一样,因此怀疑导致错误的原因是SQL本身造成的。这个SQL在多个OR条件中的一个包含了一个EXISTS的连接,使得这个连接成为多个OR条件的一个,也就是说,这个EXISTS连接只能作为部分条件与其他表连接。无论从性能上还是从写法上,这个SQL都存在问题:
SQL> select count(*) from
2 (
3 select nvl(deal.temp_flag,0) as temp_flag, b.url,c.member_flag,b.address, b.plat_id, b.used_name, c.code ,
4 c.name ,b.check_flag, b.factory_flag, b.saler_flag, b.send_flag, c.abbr ,c.id, d.plat_name, d.plat_father
5 from cat_enterprise b,cat_org c, plt_plat d ,CAT_DEALER deal
6 where b.id = c.id
7 and b.plat_id = d.id
8 and (c.name like '%天津%'
9 or c.abbr like '%天津%'
10 or b.used_name like '%天津%'
11 or (exists
12 (select 1 from cat_manufacture cm, cat_org c
13 where cm.molecule_orgid = c.id
14 and cm.id=b.id
15 and c.abbr || c.code|| c.name_wb|| c.spell_abbr || c.name like '%天津%' )
16 )
17 )
18 and c.ENABLE_FLAG = 1
19 and deal.id(+)=b.id
20 );
COUNT(*)
----------
312
Elapsed: 00:00:01.22
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=355 Card=1 Bytes=169)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 NESTED LOOPS (OUTER) (Cost=355 Card=3575 Bytes=604175)
4 3 NESTED LOOPS (Cost=355 Card=3575 Bytes=514800)
5 4 HASH JOIN (Cost=355 Card=3575 Bytes=425425)
6 5 TABLE ACCESS (FULL) OF 'CAT_ORG' (Cost=129 Card=19276 Bytes=1233664)
7 5 TABLE ACCESS (FULL) OF 'CAT_ENTERPRISE' (Cost=195 Card=26322 Bytes=1447710)
8 4 INDEX (UNIQUE SCAN) OF 'PK_PLT_PLAT' (UNIQUE)
9 3 INDEX (UNIQUE SCAN) OF 'PK_CAT_DEALER' (UNIQUE)
10 2 NESTED LOOPS (Cost=3 Card=1 Bytes=120)
11 10 TABLE ACCESS (BY INDEX ROWID) OF 'CAT_MANUFACTURE' (Cost=2 Card=1 Bytes=27)
12 11 INDEX (UNIQUE SCAN) OF 'PK_CAT_MANUFACTURE' (UNIQUE) (Cost=1 Card=8469)
13 10 TABLE ACCESS (BY INDEX ROWID) OF 'CAT_ORG' (Cost=1 Card=1 Bytes=93)
14 13 INDEX (UNIQUE SCAN) OF 'PK_CAT_ORG' (UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
86956 consistent gets
0 physical reads
0 redo size
492 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
由于OR条件中的子查询语句的存在,Oracle选择了FILTER执行计划来过滤条件,可以看到这个SQL的逻辑读需要86956,而执行时间也超过了1秒,下面根据表的主外键关系等信息等价的改写SQL:
SQL> select count(*) from
2 (
3 select nvl(deal.temp_flag,0) as temp_flag, b.url,c.member_flag,b.address, b.plat_id, b.used_name, c.code ,
4 c.name ,b.check_flag, b.factory_flag, b.saler_flag, b.send_flag, c.abbr ,c.id, d.plat_name, d.plat_father
5 from cat_enterprise b,cat_org c, plt_plat d ,CAT_DEALER deal,
6 (
7 select cm.id,
8 c.abbr || c.code|| c.name_wb|| c.spell_abbr || c.name name
9 from cat_manufacture cm, cat_org c
10 where cm.molecule_orgid = c.id
11 ) m
12 where b.id = c.id
13 and b.plat_id = d.id
14 and (c.name like '%天津%'
15 or c.abbr like '%天津%'
16 or b.used_name like '%天津%'
17 or m.name like '%天津%')
18 and m.id(+) = c.id
19 and c.ENABLE_FLAG = 1
20 and deal.id(+)=b.id
21 );
COUNT(*)
----------
312
1 row selected.
Elapsed: 00:00:00.68
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=19631 Card=1 Bytes=289)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (OUTER) (Cost=19631 Card=19276 Bytes=5570764)
3 2 NESTED LOOPS (Cost=19631 Card=19276 Bytes=5088864)
4 3 FILTER
5 4 NESTED LOOPS (OUTER)
6 5 HASH JOIN (Cost=355 Card=19276 Bytes=2293844)
7 6 TABLE ACCESS (FULL) OF 'CAT_ORG' (Cost=129 Card=19276 Bytes=1233664)
8 6 TABLE ACCESS (FULL) OF 'CAT_ENTERPRISE' (Cost=195 Card=26322 Bytes=1447710)
9 5 VIEW PUSHED PREDICATE (Cost=1 Card=1 Bytes=120)
10 9 NESTED LOOPS (Cost=3 Card=1 Bytes=120)
11 10 TABLE ACCESS (BY INDEX ROWID) OF 'CAT_MANUFACTURE' (Cost=2 Card=1 Bytes=27)
12 11 INDEX (UNIQUE SCAN) OF 'PK_CAT_MANUFACTURE' (UNIQUE) (Cost=1 Card=8469)
13 10 TABLE ACCESS (BY INDEX ROWID) OF 'CAT_ORG' (Cost=1 Card=1 Bytes=93)
14 13 INDEX (UNIQUE SCAN) OF 'PK_CAT_ORG' (UNIQUE)
15 3 INDEX (UNIQUE SCAN) OF 'PK_PLT_PLAT' (UNIQUE)
16 2 INDEX (UNIQUE SCAN) OF 'PK_CAT_DEALER' (UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
30649 consistent gets
0 physical reads
0 redo size
492 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
修改后的SQL逻辑读下降为30649,执行时间为0.68秒。
不过修改之后的SQL是否能避免ORA-7445错误的出现,就要拭目以待了。











