Limiting the percentage of ordered rows retrieved in Oracle 12c


In a previous post I compared two solutions for the same ranking
problem. Now I would like to compare two solutions for the same
percentages report.
The goal of the query: Let's see the first 5% rows of the total
rows according to the salary in descending way.
Let's consider the solution in Oracle 12c, where we can use the
new FETCH {FIRST|NEXT}  <pct> PERCENT ROWS clause:
SELECT /*+ GATHER_PLAN_STATISTICS */  employee_id, last_name, salary
FROM employees  ORDER BY salary DESC  FETCH FIRST 5 PERCENT ROWS
ONLY;

EMPLOYEE_ID LAST_NAME      SALARY
----------- ------------ --------
100 King               24000
101 Kochhar            17000
102 De Haan            17000
145 Russell            14000
146 Partners           13500
201 Hartstein          13000

6 rows selected
One of the possible traditional solution uses
the cumulative distribution (CUME_DIST) analytic function.
SELECT /*+ GATHER_PLAN_STATISTICS */  E.*
FROM
(SELECT employee_id, last_name, salary,
CUME_DIST()  OVER( ORDER BY  Salary DESC) cum_dist
FROM employees  ORDER BY salarY DESC) E
WHERE E.CUM_DIST<=0.05;

EMPLOYEE_ID LAST_NAME     SALARY        CUM_DIST
----------- --------- ----------   -------------
100         King           24000   0.009345794393
101      Kochhar           17000   0.02803738318
102      De Haan           17000   0.02803738318
145      Russell           14000   0.03738317757
146     Partners           13500   0.04672897196
Let's consider the execution plan for that query which 
uses the new FETCH {FIRST|NEXT} <pct> PERCENT ROWS clause:
Executions:1 | is_bind_sensitive:N  | is_bind_aware: N | Parsing schema:HR | Disk reads:0 | Consistent gets:1555
SQL_ID  1jnwttv1rt22u, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */  employee_id, last_name, salary
FROM employees  ORDER BY salary DESC  FETCH FIRST 5 PERCENT ROWS ONLY

Plan hash value: 720055818

---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |       |       |     3 (100)|          |
|*  1 |  VIEW               |           |   107 |  8453 |     3   (0)| 00:00:01 |
|   2 |   WINDOW SORT       |           |   107 |  1712 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMPLOYEES |   107 |  1712 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CEIL("from$_
subquery$_002"."rowlimit_$$_total"*5/100))

Operation_id:1 last ouput rows:6 Query block name: SEL$1
Operation_id:2 last ouput rows:107 Query block name: SEL$1
Operation_id:3 last ouput rows:107 Query block name: SEL$1
====================================================================================================

Now we examine the execution plan of the query that uses the CUME_DIST analytic function:

Executions:1 | is_bind_sensitive:N  | is_bind_aware: N | Parsing schema:HR | Disk reads:0 | Consistent gets:7
SQL_ID  0u2b7mdnfy5nh, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */  E.* FROM (SELECT employee_id,
last_name, salary, CUME_DIST()  OVER( ORDER BY  Salary DESC) cum_dist
FROM employees  ORDER BY salarY DESC) E WHERE E.CUM_DIST<=0.05

Plan hash value: 720055818

---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |       |       |     3 (100)|          |
|*  1 |  VIEW               |           |   107 |  5671 |     3   (0)| 00:00:01 |
|   2 |   WINDOW SORT       |           |   107 |  1712 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMPLOYEES |   107 |  1712 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("E"."CUM_DIST"<=.05)

Operation_id:1 last ouput rows:5 Query block name: SEL$2
Operation_id:2 last ouput rows:107 Query block name: SEL$2
Operation_id:3 last ouput rows:107 Query block name: SEL$2
====================================================================================================
We can observe that BOTH queries use the SAME EXECUTION plan,
but with different number of consistent gets (1551 versus 7!). 
Other metrics are same. 
I created a greater table:
CREATE TABLE big_emp(empno,last_name,first_name,salary)
AS
SELECT ROWNUM,E.last_name||ROWNUM,E.first_name||ROWNUM,E.salary
FROM employees E,employees D,employees F;
SELECT COUNTt(*)  FROM big_emp;

COUNT(*)
----------
1225043

I executed the following queries:

SELECT /*+ GATHER_PLAN_STATISTICS */  *
FROM big_emp  ORDER BY salary DESC  FETCH FIRST 1 PERCENT ROWS
ONLY;

SELECT /*+ GATHER_PLAN_STATISTICS */  E.*
FROM
(SELECT b.*,
CUME_DIST()  OVER( ORDER BY  Salary DESC) cum_dist
FROM big_emp b ORDER BY salarY DESC) E
WHERE E.CUM_DIST<=0.01;

Of course, I got the same execution plan.
But the were no big difference between the number of consistent gets.

Executions:1 | is_bind_sensitive:N  | is_bind_aware: N | Parsing schema:HR | Disk reads:0 | Consistent gets:6693
SQL_ID  5cv1yxs3aht9g, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */  E.* FROM (SELECT b.*, CUME_DIST()
OVER( ORDER BY  Salary DESC) cum_dist FROM big_emp b ORDER BY salarY
DESC) E WHERE E.CUM_DIST<=0.01

Plan hash value: 1432758025
Executions:1 | is_bind_sensitive:N  | is_bind_aware: N | Parsing schema:HR | Disk reads:6684 | Consistent gets:8059
SQL_ID  gg7n3x4px3tak, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */  * FROM big_emp  ORDER BY salary
DESC  FETCH FIRST 1 PERCENT ROWS ONLY

Plan hash value: 1432758025
The number of consistent gets were 
(6693- for traditional versus 8059 - new for feature)
As I saw we can use both solutions,but the FETCH clause
is more readable and user-friend.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: