Analyzing Oracle Data Warehouse with Analytic Functions II.


Window and Report Aggregate Family

Window aggregate functions operate on an ordered set of rows and for each row they return an aggregate value for a window rooted at that row.
The window can be unbounded, partially unbounded or bounded. The size of the window can be specified by physically (eg. With number of rows, ROWS) or logically (typically with date/time values, RANGE).

There are several keywords available:

CURRENT ROW – specifies that window starts or ends at the current row

UNBOUNDED PRECEDING – specifies that the window starts at the first row of the partition.

UNBOUNDED FOLLOWING – specifies that the window ends at the last row of the partition.
If we don’t specify explicitly the one of the boundaries the CURRENT ROW is the default.
Example 1:
We want to create a cummulated aggregation (sometimes called running aggregates) for the salaries for the whole table and for each department:

SELECT department_id,employee_id, last_name, salary, 
SUM(salary) 
OVER (ORDER BY salary DESC RANGE UNBOUNDED PRECEDING) cum_range,
SUM(salary) 
OVER (PARTITION BY department_id 
ORDER BY salary DESC RANGE UNBOUNDED PRECEDING) 
AS cum_range_for_depts,
SUM(salary) 
OVER (ORDER BY salary desc ROWS UNBOUNDED PRECEDING)  cum_rows,
SUM(salary) 
OVER (PARTITION BY department_id 
ORDER BY salary DESC ROWS UNBOUNDED PRECEDING)  cum_rows_for_depts
FROM employees
 --ORDER BY department_id ;


Example 2:
Now we define an increasing, decreasing and fix sized window for each department:

SELECT department_id, last_name, salary,
SUM(salary) 
OVER (PARTITION BY department_id ORDER BY salary DESC
ROWS UNBOUNDED PRECEDING)  AS cum_sum,
SUM(salary) 
OVER (PARTITION BY department_id ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)  till_current,
AVG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC
RANGE UNBOUNDED PRECEDING) cum_avg , AVG(salary)
OVER ( PARTITION BY department_id 
ORDER BY salary DESC 
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) avg_sal_rest,
AVG(salary)   OVER (PARTITION BY department_id) AS dept_avg
FROM employees


Example 3:
We want to select to 5 most popular products and running total for each quarter in each year.

SELECT CASE WHEN t.rank=1 THEN t.year END year, t.quarter, t.product, 
t.amount, t.quantity, t.rsum_for_quarter, t.rank rank_for_amount
FROM
(SELECT t.calendar_year year,t.calendar_quarter_desc quarter, 
p.prod_name product, SUM(s.amount_sold) amount, 
SUM(SUM(s.amount_sold))
OVER (PARTITION BY t.calendar_year,t.calendar_quarter_desc
ORDER BY SUM(s.amount_sold) DESC RANGE UNBOUNDED PRECEDING) 
rsum_for_quarter, 
SUM(s.quantity_sold) quantity,
RANK()  OVER (PARTITION BY t.calendar_year , t.calendar_quarter_desc
ORDER BY SUM(s.amount_sold) DESC ) rank
FROM sh.sales s, sh.times t, sh.products p
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id
GROUP BY t.calendar_year,t.calendar_quarter_desc,p.prod_name) t
WHERE t.rank < 5;


In the SUM(SUM(s.amount_sold)) expression the inner SUM is standard group function, which is the argument of the outer SUM that is an analytical function.
Example 4:
Many times we want to see each element’s contribution (rate) for the total value.
Also called percentages or sharing value.
In the following example we compute the percentages of each  salary of subtotal (department level) of grand total:

SELECT department_id, last_name, salary,
RATIO_TO_REPORT(salary)
OVER (PARTITION BY department_id)  AS rate_for_dept,
RATIO_TO_REPORT(salary) OVER ()   AS rate_for_total
FROM employees;


Example 5:
Now we want to compare each employee’s salary to the lowest anf highest salaries of his/her department using the FIRST_VALUE and LAST_VALUE functions.

SELECT t.*,t.salary/t.lowest_sal  rate_for_lowest_sal,
t.salary/t.highest_sal AS rate_for_highest_sal
FROM ( SELECT department_id , last_name, salary,
FIRST_VALUE(salary)
OVER ( PARTITION BY department_id ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lowest_sal,
LAST_VALUE(salary) OVER ( PARTITION BY department_id ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) highest_sal
FROM employees 
ORDER BY department_id) t


Example 6:
Many times we need to compute moving values(MAX,SUM,AVG). In this example we calculate moving sum for better understanding:

SELECT last_name, hire_date, salary,
SUM(salary)  
OVER (ORDER BY hire_date RANGE BETWEEN NUMTOYMINTERVAL(3,'MONTH') 
PRECEDING AND NUMTOYMINTERVAL(3,'MONTH') FOLLOWING) AS mov_sum_3_months,
SUM(salary)  
OVER (ORDER BY hire_date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) 
AS mov_sum_3_rows
FROM employees;


Observe the difference between RANGE and ROWS.
With RANGE we specified a six months window, while with ROWS we created a 7 row width window.


To be continued…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: