Analytical Function

- Analytical functions are used mainly for the analysis of data as required by decision making managers
- Typical queries such as What are the top ten products sold in the rainy seasons? and In which region the product is selling less?
- Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on the query.
- Such queries are computation intensive and these queries are statistical in nature and do not require much data details like name or address of the customers.
- The technique of extraction of information from data warehouse is called data mining
- OLTP server are quite computation intensive and require large memory resources like memory and CPU

* Ranking : For calculating ranks , rank, dense_rank, row_number

* Windowing : For calculating the cumulative and moving aggregates eg. For 7 days

* Reporting : For calculating shares, eg. Market share of a product among products of the same category

* Lag/Lead : Comparing values of two rows like finding out increase in value of the current row over the previous row

* First/Last : First or last value in an ordered group, eg. Value of sale on the last date of the month

* Rollup/cube : Cube and rollup extensions to the group by clause of the select statement

Example with some data

SQL> SELECT * FROM sales;

YEAR REGION AMT
---- -------------------- ----------
1998 Singapore 2000
1998 India 35000
1998 Singapore 45000
1999 India 20000
2000 USA 1000
2000 Singapore 7000
2000 India 15000

7 rows selected.
Ranking Function
rank()
- Ranks are used for finding rank of an item in a group
- Ranks are determined by each partition based on order by clause

Syntax

RANK() OVER (
[ PARTITION BY [,….] ]
ORDER BY [collate clause] [ASC | DESC] [NULLS FIRST | LAST] [,…].
)
- PARTITION BY indicates how the rows in the result sets are partitioned. In its absence, all rows are partitioned into one group.
- Partition expression can be either a column expression or an aggregate expression ORDER BY indicates how the rows are sorted. The Order by expression can either be a column expression or an aggregate expression.
-By default NULLS have the highest values and can specify how nulls are to be treated.

Example

SQL> select year, region, amt,
rank() over ( partition by year order by amt ) rn
from sales;

YEAR REGION AMT RN
---- -------------------- ---------- ----------
1998 Singapore 2000 1
1998 India 35000 2
1998 Singapore 45000 3
1998 USA 70000 4
1999 India 20000 1
2000 USA 1000 1
2000 Singapore 7000 2
2000 India 15000 3

8 rows selected.

dense_rank()
- Dense rank is that in the dense rank there is no gap between the ranks


Syntax

DENSE_RANK() OVER(
[ PARTITION BY [,….] ]
ORDER BY [ collate clause ] [ ASC | DESC ] [ NULLS FIRST | LAST ]
)
SQL>SELECT year,region,amt,
dense_rank() over (partition by year order by amt) rn
from sales;

YEAR REGION AMT RN
---- -------------------- ---------- ----------
1998 Singapore 2000 1
1998 India 35000 2
1998 Singapore 35000 2
1998 USA 70000 3
1999 India 20000 1
2000 USA 1000 1
2000 Singapore 7000 2
2000 India 15000 3

8 rows selected.

row_number()
- row_number function assigns unique rank to each row even if they are having the same value of order by expression.

ROW_NUMBER() OVER (
[ PARTITION BY [,….] ]
ORDER BY [ collate clause ] [ ASC | DESC ] [ NULLS FIRST | LAST ][,…].
)
SQL> SELECT year,region,amt,
row_number() over (partition by year order by amt) rn from sales;
YEAR REGION AMT RN
---- -------------------- ---------- ----------
1998 Singapore 2000 1
1998 India 35000 2
1998 Singapore 35000 3
1998 USA 70000 4
1999 India 20000 1
2000 USA 1000 1
2000 Singapore 7000 2
2000 India 15000 3

8 rows selected.


Windowing Function
- Windowing functions are used for calculating moving averages, centered averages, cumulative sums and latest values
- Return values for each row and these values depend on the values of other rows


Example : Finding out cumulative sum of amount
SQL> SELECT year, region, amt,
sum(amt) over(partition by year order by amt rows unbounded preceding) tot_sum
from sales;

YEAR REGION AMT TOT_SUM
---- -------------------- ---------- ----------
1998 Singapore 2000 2000
1998 India 35000 37000
1998 Singapore 35000 72000
1998 USA 70000 142000
1999 India 20000 20000
2000 USA 1000 1000
2000 Singapore 7000 8000
2000 India 15000 23000

8 rows selected.

Example : Finding out last amount in the year

SQL> SELECT year,region,amt,
last_value(amt) over( partition by year order by amt ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) lat_amt
from sales;

YEAR REGION AMT LAT_AMT
---- -------------------- ---------- ----------
1998 Singapore 2000 70000
1998 India 45000 70000
1998 Singapore 45000 70000
1998 USA 70000 70000
1999 India 20000 20000
2000 USA 1000 15000
2000 Singapore 7000 15000
2000 India 15000 15000

8 rows selected.


Lag/Lead Function
- Lag/Lead are used to compare the value of the current row with the value of the previous rows or next rows without taking a Cartesian product. Lag is used for precious row and Lead is used for the next row

[LAG | LEAD ] ( , [[,]])
OVER (
[ PARTITION BY [,….] ]
ORDER BY [ collate clause ] [ ASC | DESC ] [ NULLS FIRST | LAST ][,…].
)
SQL> select year, region, amt, lag(amt,1) over(order by amt) lag_amt from sales;
YEAR REGION AMT LAG_AMT
---- -------------------- ---------- ----------
2000 USA 1000
1998 Singapore 2000 1000
2000 Singapore 7000 2000
2000 India 15000 7000
1999 India 20000 15000
1998 Singapore 45000 20000
1998 India 45000 45000
1998 USA 70000 45000

8 rows selected.

FIRST/LAST function
- FIRST/LAST functions are used to return the aggregate value for the set of rows which ranks first or last in a set of wors defined bh partition.
- Ranks are determined using the order by clause.

[ MIN|MAX|COUNT|SUM|AVG|STDDEV|VARIANCE] ()
KEEP (DENSE_RANK [FIRST|LAST] ORDER BY [,…]
[ ASC| DESC] [NULLS FIRST | NULLS LAST] }
SQL> select year, region, amt,
min(amt) KEEP ( DENSE_RANK FIRST order by (amt) ) over() "minvalue"
from sales;

YEAR REGION AMT minvalue
---- -------------------- ---------- ----------
1998 USA 70000 1000
1999 India 20000 1000
1998 Singapore 35000 1000
1998 India 35000 1000
1998 Singapore 2000 1000
2000 Singapore 7000 1000
2000 India 15000 1000
2000 USA 1000 1000

8 rows selected.

No comments: