yangtingkun
===========================================================
ORA-7445(qcpircx)错误
===========================================================

Oracle9204 for Linux X86-64上碰到一个奇怪的错误ORA-7445(qcpircx)错误。


这个问题奇怪的地方之一是:发现这个错误后,发现这个错误没有办法进行跟踪。

从后台alert文件中发现两个错误信息:

Mon Mar 10 19:36:47 2008
Errors in file /data/admin/testcen/udump/testcen_ora_10840.trc:
ORA-07445: exception encountered: core dump [qcpircx()+259] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []
Mon Mar 10 19:37:30 2008
Errors in file /data/admin/testcen/udump/testcen_ora_17650.trc:
ORA-07445: exception encountered: core dump [qcpircx()+259] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []

但是查询进一步的详细信息时,发现没有对应的信息:

[oracle@datasrv2 bdump]$ more /data/admin/testcen/udump/testcen_ora_10840.trc
[oracle@datasrv2 bdump]$ more /data/admin/testcen/udump/testcen_ora_17650.trc
[oracle@datasrv2 bdump]$ ls -l /data/admin/testcen/udump/testcen_ora_10840.trc
-rw-r----- 1 oracle oinstall 0 Mar 10 19:36 /data/admin/testcen/udump/testcen_ora_10840.trc
[oracle@datasrv2 bdump]$ ls -l /data/admin/testcen/udump/testcen_ora_17650.trc
-rw-r----- 1 oracle oinstall 0 Mar 10 19:37 /data/admin/testcen/udump/testcen_ora_17650.trc

alert文件中产生了7445错误,但是对应的trace文件并没有生成,这种情况还是比较少见的。没有详细的信息就没有办法进一步的跟踪了,不清楚导致错误的SQL,也不清楚到底是哪个会话发出的SQL

唯一的办法只有到metalinkqcpircx关键字查询ORA-7445错误,结果发现metalink上对于这个错误的描述也是很奇怪的。

首先在文档Doc ID: Note:3855374.8中描述倒是比较正常,这个bug是由于使用分析函数造成的,且错误发生在分析阶段。确认影响的版本为920510.1.0.4Oracle92081010510201中修正了这个bug

不过错误描述也太简单了一点,只提到一个分析函数就完了,这种错误描述基本上没有什么参考价值,看来只能参考原始的bug信息。

打开Bug No. 3855374,文章中并没有给出原始的错误信息,但是给出了一个奇怪的重现bug的用例:

SQL> create table test (c1 number,c2 number, c3 number,c4 number);

Table created.

SQL> select distinct c2, avg(c4) over (partition by c2 order by c2)
2 from test
3 where
4 (
5 c3 = all
6 (
7 select max (c3) from test
8 where avg(c4) over (partition by c2 order by c2) = avg(c4) over (partition by c2 order by c2)
9 )
10 );
select distinct c2, avg(c4) over (partition by c2 order by c2)
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

这个测试的例子就很奇怪了,熟悉分析函数就应该清楚,分析函数是不能直接用在WHERE语句中的,这个SQL根本就编译不过。不过这也变相证明了错误确实是发生在分析阶段。

而且和我碰到的问题不同的是,这个SQL引发的后台ORA-7445错误产生了trace文件:

Thu Apr 3 15:44:41 2008
Errors in file /data/admin/testcen/udump/testcen_ora_32024.trc:
ORA-07445: exception encountered: core dump [qcpircx()+259] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []

但是trace文件也比较奇怪:

[oracle@datasrv2 bdump]$ more /data/admin/testcen/udump/testcen_ora_32024.trc
/data/admin/testcen/udump/testcen_ora_32024.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2
System name: Linux
Node name: datasrv2
Release: 2.6.9-5.ELsmp
Version: #1 SMP Wed Jan 5 19:29:47 EST 2005
Machine: x86_64
Instance name: testcen
Redo thread mounted by this instance: 1
Oracle process number: 38
Unix process pid: 32024, image: oracle@datasrv2 (TNS V1-V3)

*** 2008-04-03 15:44:41.168
*** SESSION ID:(12.11637) 2008-04-03 15:44:41.167
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x0, PC: [0x1762c93, qcpircx()+259]
*** 2008-04-03 15:44:41.232
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [qcpircx()+259] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []
No current SQL statement being executed.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
Cannot find symbol
Cannot find symbol
ksedmp()+226 call ksedst() 000000001 ? 000000001 ?
10C0DDE646FE52F0 ?
000000000 ? 00003A3F6 ?
005D56BC2 ?
ssexhd()+912 call ksedmp() 000000067 ? 000000001 ?
10C0DDE646FE52F0 ?
000000000 ? 00003A3F6 ?
005D56BC2 ?
__pthread_sighandle call ssexhd() 00000000B ? 7FBFFF8ED0 ?
r_rt()+104 00000001C ? 000000000 ?
00003A3F6 ? 005D56BC2 ?
killpg()+48 call __pthread_sighandle 2A9615A4E0 ? 7FBFFF8ED0 ?
r_rt() 7FBFFF8DA0 ? 000000000 ?
00003A3F6 ? 005D56BC2 ?
<(nil)> call killpg() 2A9615A4E0 ? 7FBFFF8ED0 ?
7FBFFF8DA0 ? 000000000 ?

Oracle并没有记录下产生错误的SQL语句。

文档中给出的第二个用例也很奇怪:

SQL> CONN TEST/TEST
Connected.
SQL> select c2, avg(c4) over (partition by c2 order by c2)
2 from test
3 where
4 (
5 c3 =
6 (
7 select max(c3)
8 from test
9 where avg(c4) over (partition by c2 order by c2)
10 )
11 );
select c2, avg(c4) over (partition by c2 order by c2)
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

又是一个完全不通的语法,WHERE语句确实判断条件。似乎只要在子查询的WHERE语句中包含父查询中出现过的分析函数,就会引发问题。

但是奇怪的是,通过自己构建的例子,没有模拟出来:

SQL> CONN TEST/TEST
Connected.
SQL> DROP TABLE T;

Table dropped.

SQL> CREATE TABLE T AS SELECT ROWNUM ID, A.* FROM ALL_OBJECTS A WHERE 1=2;

Table created.

SQL> SELECT OBJECT_ID, AVG(DATA_OBJECT_ID) OVER(PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
2 FROM T
3 WHERE ID =
4 (
5 SELECT MAX(ID)
6 FROM T
7 WHERE AVG(DATA_OBJECT_ID) OVER(PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
8 )
9 ;
WHERE AVG(DATA_OBJECT_ID) OVER(PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
*
ERROR at line 7:
ORA-00934: group function is not allowed here

其实就是将表TEST换成了T,将C2换成了OBJECT_ID,将C3换成了ID,将C4换成了DATA_OBJECT_ID,而且OBJECT_ID, IDDATA_OBJECT_ID列都是NUMBER类型,其他没有任何的变化,而错误就没能重现。

尝试多次后发现,问题居然出现在括号处,对比下面两个语句:

SQL> SELECT OBJECT_ID, AVG(DATA_OBJECT_ID) OVER (PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
2 FROM T
3 WHERE ID =
4 (
5 SELECT MAX(ID)
6 FROM T
7 WHERE AVG(DATA_OBJECT_ID) OVER (PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
8 )
9 ;
WHERE AVG(DATA_OBJECT_ID) OVER (PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
*
ERROR at line 7:
ORA-00934: group function is not allowed here


SQL> SELECT OBJECT_ID, AVG(DATA_OBJECT_ID) OVER (PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
2 FROM T
3 WHERE
4 (
5 ID =
6 (
7 SELECT MAX(ID)
8 FROM T
9 WHERE AVG(DATA_OBJECT_ID) OVER (PARTITION BY OBJECT_ID ORDER BY OBJECT_ID)
10 )
11 );
FROM T
*
ERROR at line 8:
ORA-03113: end-of-file on communication channel

问题居然发生在WHERE条件后面的括号上,原则上讲,这里是否添加括号对SQL语法和逻辑没有任何的影响,但是就是多了这个括号,使得原本正常应该分析给出语法错误的情况,变成了一个ORA-7445错误。

这个错误一方面说明920上的分析函数还存在很多bug,另一方面也说明随意给OracleSQL语句添加一些没有必要的零碎是很不好的习惯。

另一个添加零碎语句造成的问题:http://yangtingkun.itpub.net/post/468/248790

yangtingkun 发表于:2008.04.05 23:20 ::分类: ( ORACLE , Bug ) ::阅读:(451次) :: 评论 (0)

发表评论
标题

在此添加评论
表情符号: smile laughing tongue angry crying sad wassat wink

称呼

邮箱地址(可选)

个人主页(可选)

 authimage


切换风格
新闻聚合
博客日历
文章归档...
最新发表...
最新评论...
最多阅读文章...
最多评论文章...
博客统计...
Blog信息
网站链接...