yangtingkun
===========================================================
FIRST_ROWS优化模式访问远端表可能导致错误结果(一)
===========================================================

Oracle9204上碰到了这个问题,FIRST_ROWS优化模式下,通过数据库链访问远端表,且SQL包括IN子查询语句时,优化器可能给出错误的执行计划,从而导致结果集出现重复记录。


下面通过一个例子来重现这个问题:

SQL> CONN YANGTK/YANGTK@TEST1已连接。
SQL> CREATE TABLE T AS SELECT * FROM DBA_OBJECTS
2 WHERE OWNER = 'SYS'
3 AND OBJECT_TYPE IN ('TABLE', 'INDEX', 'SYNONYM');

表已创建。

SQL> CREATE TABLE T1 AS SELECT * FROM DBA_TAB_PRIVS;

表已创建。

SQL> CREATE INDEX IND_T1_GRANTEE ON T1(GRANTEE);

索引已创建。

SQL> CREATE INDEX IND_T1_NAME ON T1(TABLE_NAME);

索引已创建。

SQL> CREATE UNIQUE INDEX IND_T_NAME ON T(OBJECT_NAME);

索引已创建。

SQL> CONN YANGTK/YANGTK@YTK92已连接。
SQL> DROP DATABASE LINK TEST1;

数据库链接已删除。

SQL> CREATE DATABASE LINK TEST1 USING 'TEST1';

数据库链接已创建。

SQL> CREATE TABLE T2 AS SELECT * FROM DBA_SYNONYMS;

表已创建。

SQL> CREATE INDEX IND_T2_NAME ON T2(SYNONYM_NAME);

索引已创建。

SQL> SET AUTOT ON EXP
SQL> SELECT OBJECT_ID, OBJECT_NAME FROM T@TEST1
2 WHERE OBJECT_NAME IN
3 (
4 SELECT TABLE_NAME FROM T1@TEST1
5 WHERE GRANTEE IN
6 (SELECT GRANTEE FROM T1@TEST1 WHERE TABLE_NAME = 'HELP')
7 )
8 AND OBJECT_NAME NOT IN (SELECT SYNONYM_NAME FROM T2);

OBJECT_ID OBJECT_NAME
---------- ------------------------------
3387 ODCI_SECOBJ$
3388 ODCI_WARNINGS$
3200 PSTUBTBL

执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 MERGE JOIN
3 2 SORT (JOIN)
4 3 REMOTE* TEST1.YTK_THINKPAD
5 2 SORT (JOIN)
6 5 VIEW OF 'VW_NSO_1'
7 6 REMOTE* TEST1.YTK_THINKPAD
8 1 TABLE ACCESS (FULL) OF 'T2'


4 SERIAL_FROM_REMOTE SELECT "OBJECT_NAME","OBJECT_ID" FROM "T" "S YS_ALIAS_1"

7 SERIAL_FROM_REMOTE SELECT /*+ */ DISTINCT "A1"."TABLE_NAME" FRO
M "T1" "A1" WHERE EXISTS (SELECT 0

而如果将优化模式改为FIRST_ROWS

SQL> SELECT /*+ FIRST_ROWS */ OBJECT_ID, OBJECT_NAME FROM T@TEST1
2 WHERE OBJECT_NAME IN
3 (
4 SELECT TABLE_NAME FROM T1@TEST1
5 WHERE GRANTEE IN
6 (SELECT GRANTEE FROM T1@TEST1 WHERE TABLE_NAME = 'HELP')
7 )
8 AND OBJECT_NAME NOT IN (SELECT SYNONYM_NAME FROM T2);

OBJECT_ID OBJECT_NAME
---------- ------------------------------
3200 PSTUBTBL
3200 PSTUBTBL
3387 ODCI_SECOBJ$
3387 ODCI_SECOBJ$
3388 ODCI_WARNINGS$
3388 ODCI_WARNINGS$

已选择6行。

执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=1647 Card=
45 Bytes=4320)

1 0 FILTER
2 1 NESTED LOOPS (Cost=477 Card=45 Bytes=4320)
3 2 VIEW OF 'VW_NSO_1' (Cost=12 Card=462 Bytes=7854)
4 3 REMOTE* TEST1.YTK_THINKPAD
5 2 REMOTE* (Cost=1 Card=1 Bytes=79) TEST1.YTK_THINKPAD
6 1 INDEX (FULL SCAN) OF 'IND_T2_NAME' (NON-UNIQUE) (Cost=26
Card=511 Bytes=8687)

4 SERIAL_FROM_REMOTE SELECT "A1"."TABLE_NAME" FROM "T1" "A1" WHER
E EXISTS (SELECT 0 FROM "T1" "A2" W

5 SERIAL_FROM_REMOTE SELECT "OBJECT_NAME","OBJECT_ID" FROM "T" "S YS_ALIAS_1" WHERE "OBJECT_NAME"=:1

现在得到的每条记录都是重复的。根据上面的SQL和建表语句可以推断,OBJECT_NAME是不会发生重复的,因此可以确定,在FIRST_ROWS模式下,Oracle产生了重复记录。

其实详细观察Oracle的执行计划也可以看到,由于采用了IN的方式,这里的执行计划应该是半连接,也就是SEMI-JOIN,但是Oracle却错误的选择了NESTED_LOOP执行计划,下面看看ALL_ROWSFIRST_ROWS(1)优化模式下的执行计划:

SQL> SELECT /*+ FIRST_ROWS(1) */ OBJECT_ID, OBJECT_NAME FROM T@TEST1
2 WHERE OBJECT_NAME IN
3 (
4 SELECT TABLE_NAME FROM T1@TEST1
5 WHERE GRANTEE IN
6 (SELECT GRANTEE FROM T1@TEST1 WHERE TABLE_NAME = 'HELP')
7 )
8 AND OBJECT_NAME NOT IN (SELECT SYNONYM_NAME FROM T2);

OBJECT_ID OBJECT_NAME
---------- ------------------------------
3387 ODCI_SECOBJ$
3388 ODCI_WARNINGS$
3200 PSTUBTBL

执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=16 Card=1
Bytes=96)

1 0 FILTER
2 1 NESTED LOOPS (SEMI) (Cost=14 Card=1 Bytes=96)
3 2 REMOTE* (Cost=2 Card=31 Bytes=2449) TEST1.YTK_THINKPAD
4 2 VIEW OF 'VW_NSO_1' (Cost=12 Card=462 Bytes=7854)
5 4 REMOTE* TEST1.YTK_THINKPAD
6 1 INDEX (FULL SCAN) OF 'IND_T2_NAME' (NON-UNIQUE) (Cost=2
Card=1 Bytes=17)

3 SERIAL_FROM_REMOTE SELECT "OBJECT_NAME","OBJECT_ID" FROM "T" "S YS_ALIAS_1"

5 SERIAL_FROM_REMOTE SELECT "A1"."TABLE_NAME" FROM "T1" "A1" WHER
E EXISTS (SELECT 0 FROM "T1" "A2" W


SQL> SELECT /*+ ALL_ROWS */ OBJECT_ID, OBJECT_NAME FROM T@TEST1
2 WHERE OBJECT_NAME IN
3 (
4 SELECT TABLE_NAME FROM T1@TEST1
5 WHERE GRANTEE IN
6 (SELECT GRANTEE FROM T1@TEST1 WHERE TABLE_NAME = 'HELP')
7 )
8 AND OBJECT_NAME NOT IN (SELECT SYNONYM_NAME FROM T2);

OBJECT_ID OBJECT_NAME
---------- ------------------------------
3200 PSTUBTBL
3387 ODCI_SECOBJ$
3388 ODCI_WARNINGS$

执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: ALL_ROWS (Cost=873 Card=33
Bytes=3168)

1 0 FILTER
2 1 HASH JOIN (SEMI) (Cost=15 Card=33 Bytes=3168)
3 2 REMOTE* (Cost=2 Card=33 Bytes=2607) TEST1.YTK_THINKPAD
4 2 VIEW OF 'VW_NSO_1' (Cost=12 Card=462 Bytes=7854)
5 4 REMOTE* TEST1.YTK_THINKPAD
6 1 INDEX (FULL SCAN) OF 'IND_T2_NAME' (NON-UNIQUE) (Cost=26
Card=511 Bytes=8687)

3 SERIAL_FROM_REMOTE SELECT "OBJECT_NAME","OBJECT_ID" FROM "T" "S YS_ALIAS_1"

5 SERIAL_FROM_REMOTE SELECT "A1"."TABLE_NAME" FROM "T1" "A1" WHER
E EXISTS (SELECT 0 FROM "T1" "A2" W

ALL_ROWS模式选择了HASH_SJFIRST_ROWS(1)选择了NL_SJ,因此都得到了正确的结果,而FIRST_ROWS由于使用了NEST_LOOP而导致结果集重复。

SQL语句中的IN换成EXISTS语句,则不会出现错误的结果:

SQL> SELECT /*+ FIRST_ROWS */ OBJECT_ID, OBJECT_NAME FROM T@TEST1
2 WHERE EXISTS
3 (
4 SELECT 1 FROM T1@TEST1
5 WHERE TABLE_NAME = OBJECT_NAME
6 AND GRANTEE IN
7 (SELECT GRANTEE FROM T1@TEST1 WHERE TABLE_NAME = 'HELP')
8 )
9 AND OBJECT_NAME NOT IN (SELECT SYNONYM_NAME FROM T2);

OBJECT_ID OBJECT_NAME
---------- ------------------------------
3387 ODCI_SECOBJ$
3388 ODCI_WARNINGS$
3200 PSTUBTBL

执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=78 Card=2
Bytes=158)

1 0 FILTER
2 1 REMOTE* (Cost=2 Card=2 Bytes=158) TEST1.YTK_THINKPAD
3 1 FILTER
4 3 REMOTE* (Cost=2 Card=5 Bytes=170) TEST1.YTK_THINKPAD
5 3 REMOTE* (Cost=2 Card=1 Bytes=34) TEST1.YTK_THINKPAD
6 1 INDEX (FULL SCAN) OF 'IND_T2_NAME' (NON-UNIQUE) (Cost=26
Card=511 Bytes=8687)

2 SERIAL_FROM_REMOTE SELECT "OBJECT_NAME","OBJECT_ID" FROM "T" "S YS_ALIAS_2"

4 SERIAL_FROM_REMOTE SELECT "GRANTEE","TABLE_NAME" FROM "T1" "A1"
WHERE "TABLE_NAME"=:1

5 SERIAL_FROM_REMOTE SELECT "GRANTEE","TABLE_NAME" FROM "T1" "A2"
WHERE "GRANTEE"=:1 AND "TABLE_NAME"

如果将TT1换成本地表,也不会出现错误:

SQL> CREATE TABLE T AS SELECT * FROM DBA_OBJECTS
2 WHERE OWNER = 'SYS'
3 AND OBJECT_TYPE IN ('TABLE', 'INDEX', 'SYNONYM');

表已创建。

SQL> CREATE TABLE T1 AS SELECT * FROM DBA_TAB_PRIVS;

表已创建。

SQL> CREATE INDEX IND_T1_GRANTEE ON T1(GRANTEE);

索引已创建。

SQL> CREATE INDEX IND_T1_NAME ON T1(TABLE_NAME);

索引已创建。

SQL> CREATE UNIQUE INDEX IND_T_NAME ON T(OBJECT_NAME);

索引已创建。

SQL> COL OBJECT_NAME FORMAT A30
SQL> SELECT /*+ FIRST_ROWS */ OBJECT_ID, OBJECT_NAME FROM T
2 WHERE OBJECT_NAME IN
3 (
4 SELECT TABLE_NAME FROM T1
5 WHERE GRANTEE IN
6 (SELECT GRANTEE FROM T1 WHERE TABLE_NAME = 'HELP')
7 )
8 AND OBJECT_NAME NOT IN (SELECT SYNONYM_NAME FROM T2);

OBJECT_ID OBJECT_NAME
---------- ------------------------------
29559 AW$AWCREATE
29567 AW$AWMD
29527 AW$EXPRESS
3387 ODCI_SECOBJ$
3388 ODCI_WARNINGS$
29573 OLAPTABLEVELS
29576 OLAPTABLEVELTUPLES
3200 PSTUBTBL

已选择8行。

执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=343 Card=6 Bytes=576)
1 0 NESTED LOOPS (Cost=343 Card=6 Bytes=576)
2 1 VIEW OF 'VW_NSO_1' (Cost=228 Card=113 Bytes=1921)
3 2 SORT (UNIQUE)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=34)
5 4 NESTED LOOPS (Cost=228 Card=113 Bytes=7684)
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=113 Bytes=3842)
7 6 INDEX (RANGE SCAN) OF 'IND_T1_NAME' (NON-UNIQUE) (Cost=1 Card=45)
8 5 INDEX (RANGE SCAN) OF 'IND_T1_GRANTEE' (NON-UNIQUE) (Cost=1 Card=1)
9 1 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=1 Card=1 Bytes=79)
10 9 INDEX (UNIQUE SCAN) OF 'IND_T_NAME' (UNIQUE)
11 10 INDEX (FULL SCAN) OF 'IND_T2_NAME' (NON-UNIQUE) (Cost=26 Card=511 Bytes=8687)

由于本地和远端表中的数据不同,这里显示的结果也不同,不过这个结果并没有重复数据。虽然Oracle使用的也是NESTED_LOOP,但是Oracle在连接后使用了一个SORT UNIQUE的操作,这个操作可以去掉重复的记录,而如果访问的是远端对象,则不存在这个操作。

从例子中几点分析可以得到这篇文章最初的推论。

yangtingkun 发表于:2008.04.23 13:44 ::分类: ( ORACLE , Bug ) ::阅读:(244次) :: 评论 (2)
re: FIRST_ROWS优化模式访问远端表可能导致错误结果(一) [回复]

既然访问远端表会改变执行计划,是否应该遵守一个原则,不使用远端表,如果需要都同步到本地再使用?

kylin 评论于: 2008.04.24 16:56
re: FIRST_ROWS优化模式访问远端表可能导致错误结果(一) [回复]

这个是bug造成的,和访问策略无关。

尽量不访问远端表的主要考虑因素是性能和数据库的独立性

yangtingkun 评论于: 2008.04.25 00:28

发表评论
标题

在此添加评论
表情符号: smile laughing tongue angry crying sad wassat wink

称呼

邮箱地址(可选)

个人主页(可选)

 authimage


切换风格
新闻聚合
博客日历
文章归档...
最新发表...
最新评论...
最多阅读文章...
最多评论文章...
博客统计...
Blog信息
网站链接...