Introduction. The Rank family
The traditional SQL statements and functions do not provide strong support for analytic tasks. Basic business intelligence calculations such as moving averages, rankings and lead/lag comparisons have required extensive programming outside of standard SQL , often with performance challenges.
From Oracle 8.1.6 Oracle provides powerful new family of functions which are useful for analyzing data warehouses or simple big data sets.
Oracle has created four families of analytic functions
- Ranking family: RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST
- Window Aggregate family: AVG, SUM, MIN, MAX, COUNT, VARIANCE and STDDEV (typically with fix sized window)
- Reporting Aggregate family: AVG, SUM, MIN, MAX, COUNT, VARIANCE and STDDEV , RATIO_TO_REPORT (typically with varying sized window)
- LAG/LEAD family: LAG and LEAD
It is important to understand the processing order of the functionalyties of the analytic SQL statements
- The joins will be executed
- Evaluating the WHERE, GROUP BY, HAVING clauses
- Partitions created
- Analytic functions applied to each row in each partition
- ORDER BY is processed to allow for precise output ordering.
In the following examples we will use the the sample schemas (HR,SH) which are available in all Oracle database.
Ranking family:
Example 1:
Let’s suppose we want rank the salaries in the employees table.
Issuing the following SELECT
SELECT ROWNUM,last_name,salary from
(SELECT * FROM EMPLOYEES ORDER BY salary DESC) t
WHERE ROWNUM<=&counter;
for counter=2 :
1 KING 24000
2 KOCHHAR 17000
for counter=3 :
1 KING 24000
2 KOCHHAR 17000
3 DE HAAN 17000
Which are the ranks for Kochhar and De Haan?
Solution:
SELECT last_name,salary,
RANK() OVER( ORDER BY salary DESC) normal_rank,
DENSE_RANK() OVER( ORDER BY salary desc) dense_rank
FROM employees;
Example 2:
Let’s see the top 5 best products regarding the revenue for each year:
SELECT t.*
FROM (select t.calendar_year year , p.prod_name product, SUM(s.amount_sold), SUM(s.quantity_sold),
RANK() OVER (PARTITION BY t.calendar_year ORDER BY SUM(s.amount_sold) DESC ) rank
FROM sh.sales s, sh.times t ,sh.products p
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id
GROUP BY t.calendar_year,p.prod_name) t
WHERE t.rank<=5;
Example 3:
We can retrieve the 4 best products regarding the revenue for each product category:
SELECT t.*
FROM (select p.prod_category,t.calendar_quarter_desc quarter , ROUND(SUM(s.amount_sold)) amount ,SUM(s.quantity_sold) quantity,
RANK() OVER (PARTITION BY P.prod_category
ORDER BY SUM(s.amount_sold) DESC ) rank
FROM sh.sales s, sh.times t ,sh.products p
WHERE s.time_id=t.time_id AND s.prod_id=P.prod_id
GROUP BY p.prod_category,calendar_quarter_desc ) t
WHERE t.rank<=4
/
Example 4:
Now we consider the cumulative distribution of the salaries for each departments:
SELECT department_id, last_name, last_name, salary,
CUME_DIST() OVER(PARTITION BY department_id ORDER BY Salary)
FROM employees;
Example 5:
We create ranks and subtotals for amounts in each year:
SELECT t.*
FROM (select t.calendar_year year , p.prod_name product , SUM(s.amount_sold) amount,SUM(s.quantity_sold),
RANK() OVER (PARTITION BY t.calendar_year ORDER BY SUM(s.amount_sold) DESC ) -1 rank,GROUPING(p.prod_name) grp_id
FROM sh.sales s, sh.times t ,sh.products p
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id
GROUP BY ROLLUP ( t.calendar_year,p.prod_name)) t
WHERE t.rank<=5
ORDER BY year,grp_id,amount DESC
/
Example 6:
The ROW_NUMBER function – which is not equivalent with neither ROWNUM nor RANK – assigns a unique number, in the sequence defined by the ORDER BY clause, to each row within the partition.
SELECT ROW_NUMBER() OVER(ORDER BY department_id) row_number , ROWNUM ,
ROW_NUMBER() OVER( PARTITION BY department_id ORDER BY salary DESC) dept_row_number,
department_id,last_name,salary,
RANK() OVER( PARTITION BY department_id ORDER BY salary DESC) normal_rank,
DENSE_RANK() OVER( PARTITION BY department_id ORDER BY salary DESC) dense_rank
FROM employees
/
To be continued…