Analyzing Oracle Data Warehouse with Analytic Functions I.


Introduction. The Rank family

 

The traditional SQL statements and functions do not provide strong support for analytic tasks. Basic business intelligence calculations such as moving averages, rankings and lead/lag comparisons have required extensive programming outside of standard SQL , often with performance challenges.

From Oracle 8.1.6 Oracle provides powerful new family of functions which are useful for analyzing data warehouses or simple big data sets.

Oracle has created four families of analytic functions

  • Ranking family: RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST
  • Window Aggregate family: AVG, SUM, MIN, MAX, COUNT, VARIANCE and STDDEV (typically with fix sized window)
  • Reporting Aggregate family: AVG, SUM, MIN, MAX, COUNT, VARIANCE and STDDEV , RATIO_TO_REPORT (typically with varying sized window)
  • LAG/LEAD family: LAG  and LEAD

It is important to understand the processing order of the functionalyties of the analytic SQL statements

  • The joins will be executed
  • Evaluating the WHERE, GROUP BY, HAVING clauses
  • Partitions created
  • Analytic functions applied to each row in each partition
  • ORDER BY is processed to allow for precise output ordering.

In the following examples we will use the the sample schemas (HR,SH) which are available in all Oracle database.

Ranking family:

Example 1:

Let’s suppose we want rank the salaries in the employees table.

Issuing the following SELECT

SELECT ROWNUM,last_name,salary from
(SELECT * FROM EMPLOYEES ORDER BY salary DESC) t
WHERE ROWNUM<=&counter;

for counter=2  :

1 KING               24000

2 KOCHHAR      17000

for counter=3  :

 1 KING               24000

2 KOCHHAR      17000

3 DE HAAN        17000

Which are the ranks for Kochhar and De Haan?
Solution:

SELECT last_name,salary,
RANK() OVER( ORDER BY salary DESC) normal_rank,
DENSE_RANK() OVER( ORDER BY salary desc) dense_rank
FROM employees;

Example 2:
Let’s see the top 5 best products regarding the revenue for each year:

SELECT t.*
FROM (select t.calendar_year year , p.prod_name product, SUM(s.amount_sold), SUM(s.quantity_sold),
RANK()  OVER (PARTITION BY t.calendar_year ORDER BY SUM(s.amount_sold) DESC ) rank
FROM sh.sales s, sh.times t ,sh.products p
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id
GROUP BY t.calendar_year,p.prod_name) t
WHERE t.rank<=5;

Example 3:

We can retrieve the 4 best products regarding the revenue for each product category:

SELECT t.*
FROM (select p.prod_category,t.calendar_quarter_desc quarter  , ROUND(SUM(s.amount_sold)) amount ,SUM(s.quantity_sold) quantity,
RANK()  OVER (PARTITION BY P.prod_category
ORDER BY SUM(s.amount_sold) DESC ) rank
FROM sh.sales s, sh.times t ,sh.products p
WHERE s.time_id=t.time_id AND s.prod_id=P.prod_id
GROUP BY p.prod_category,calendar_quarter_desc ) t
WHERE t.rank<=4
/

Example 4:
Now we consider the cumulative distribution of the salaries for each departments:

SELECT department_id, last_name, last_name, salary,
CUME_DIST() OVER(PARTITION BY department_id ORDER BY  Salary)
FROM employees;

Example 5:
We create ranks and subtotals  for amounts in  each year:

SELECT t.*
FROM (select t.calendar_year year , p.prod_name  product , SUM(s.amount_sold) amount,SUM(s.quantity_sold),
RANK()  OVER (PARTITION BY t.calendar_year ORDER BY SUM(s.amount_sold) DESC ) -1 rank,GROUPING(p.prod_name) grp_id
FROM sh.sales s, sh.times t ,sh.products p
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id
GROUP BY ROLLUP ( t.calendar_year,p.prod_name)) t
WHERE t.rank<=5
ORDER BY year,grp_id,amount DESC
/

Example 6:
The ROW_NUMBER function – which is not equivalent with neither ROWNUM nor RANK – assigns a unique number, in the sequence defined by the ORDER BY clause, to each row within the partition.

SELECT  ROW_NUMBER() OVER(ORDER BY department_id) row_number , ROWNUM ,
ROW_NUMBER() OVER( PARTITION BY department_id ORDER BY salary DESC) dept_row_number,
department_id,last_name,salary,
RANK() OVER( PARTITION BY department_id ORDER BY salary DESC) normal_rank,
DENSE_RANK() OVER( PARTITION BY department_id ORDER BY salary DESC) dense_rank
FROM employees
/
To be continued…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: