Saturday, 21 November 2015

Leading Ranks and Lagging Percentages: Analytic Functions

This post explains 7 analytical functions to manipulate the  way you display the result

1.DENSE RANK  : over (partition by department_id order by salary desc)
2. RANK :  over (partition by department_id order by salary desc)

3. FIRST_VALUE : FIRST(column)  over (partition by department_id order by salary desc)
4. LAST_VALUE

5.LEAD  LAG(column | expressionoffsetdefault)   over (partition by department_id order by salary desc)
6. LAG  LAG(column | expression, offset, default over (partition by department_id order by salary desc)  offset :  how many previous rows it should go back

7.RATIO_TO_REPORT


1. Rank the data

        Display the three employees with highest salaries by department

 The query that retrieves top or bottom N rows from the database that satisfy the certain condition refered as the TOP N query .
  Business Requirement : the most highly paid employees are or which department has the lowest sales figures 


Example :

select department_id, last_name, first_name, salary,
          DENSE_RANK() over (partition by department_id
                                 order by salary desc) dense_ranking
      from employee
      order by department_id, salary desc, last_name, first_name;
 
=========================================================================================
 
 
 
 
 
DEPARTMENT_ID LAST_NAME    FIRST_NAME                    SALARY DENSE_RANKING
————————————— ———————————  —————————————————————————     —————— —————————————
           10 Dovichi      Lori                                             1
           10 Eckhardt     Emily                         100000             2
           10 Newton       Donald                         80000             3
           10 Michaels     Matthew                        70000             4
           10 Friedli      Roger                          60000             5
           10 James        Betsy                          60000             5
           20 peterson     michael                        90000             1
           20 leblanc      mark                           65000             2
           30 Jeffrey      Thomas                        300000             1
           30 Wong         Theresa                        70000             2
              Newton       Frances                        75000             1

11 rows selected.
 
 This result revels the an interesting analytical function DENSE RANK  . When query 
uses a descending order   a NULL  value can affect the outcome of analytical function.
 
By default , with descending sort , SQL views NULL as being higher than any other value .
 
In the result the record Dovichi Lori  has an NULL salary and the DENSE RANK analytical function 
assigns the highest RANK 1 in Department 10.
 
-- You can  eliminate the null by adding the where clause salary is not null.
 
 Alternatively  , yo can use  the   NULL last as the extension to the  Order by clause 
==========================================================================================
 
select department_id, last_name, first_name, salary,
        DENSE_RANK() over (partition by department_id
                               order by salary desc NULLS LAST) dense_ranking
      from employee
    order by department_id, salary desc, last_name, first_name;

DEPARTMENT_ID LAST_NAME    FIRST_NAME                    SALARY DENSE_RANKING
————————————— ———————————  —————————————————————————     —————— —————————————
           10 Dovichi      Lori                                             5
           10 Eckhardt     Emily                         100000             1
           10 Newton       Donald                         80000             2
           10 Michaels     Matthew                        70000             3
           10 Friedli      Roger                          60000             4
           10 James        Betsy                          60000             4
           20 peterson     michael                        90000             1
           20 leblanc      mark                           65000             2
           30 Jeffrey      Thomas                        300000             1
           30 Wong         Theresa                        70000             2
              Newton       Frances                        75000             1

11 rows selected. 
 
 Still the NULL record appears at the top due to the outer order by clause.
  
 --  Quick notes :
 
1. In the query outcome the highlighted  two rows have the same salary 
    and ranked the same rank as 4.
 
2. The next record having just less than salary will have the next rank i.e 5 highlighted in green
 
  DENS_RANK return the ranking number without any gaps , regardless of any records that 
   have same  value for expression in the order by clause.
   In contrast the rank analytical function find the same value records and assign the same rank 
    the subsequent rank number take in account of this by skipping ahead.
 
 
 
 select department_id, last_name, first_name, salary,
           RANK() over (partition by department_id
                            order by salary desc NULLS LAST) regular_ranking
      from employee
    order by department_id, salary desc, last_name, first_name;

DEPARTMENT_ID LAST_NAME    FIRST_NAME                SALARY REGULAR_RANKING
————————————— ———————————  ———————————————————————   —————— ———————————————
           10 Dovichi      Lori                                           6
           10 Eckhardt     Emily                     100000               1
           10 Newton       Donald                     80000               2
           10 Michaels     Matthew                    70000               3
           10 Friedli      Roger                      60000               4
           10 James        Betsy                      60000               4
           20 peterson     michael                    90000               1
           20 leblanc      mark                       65000               2
           30 Jeffrey      Thomas                    300000               1
           30 Wong         Theresa                    70000               2
              Newton       Frances                    75000               1

11 rows selected.

  In department 10 the highlighted  rows shows ranking difference in RANK and DENSE_RANK 
 Same value row got the same rank 4 but subsequent record got the rank 6 instead of 5 in DENSE_RANK.


FINISHING FIRST  OR LAST :

For reporting purpose it might occasionally  useful to include the first value obtained in the perticular group or window.
 then you can use FIRST_VALUE analytical function

Display the first value returned per window, using FIRST_VALUE 
SQL> select last_name, first_name, department_id, hire_date, salary,
  2       FIRST_VALUE(salary)
  3       over (partition by department_id order by hire_date) first_sal_by_dept
  4   from employee
  5  order by department_id, hire_date;

LAST_NAME     FIRST_NAME   DEPARTMENT_ID HIRE_DATE  SALARY FIRST_SAL_BY_DEPT
————————— ——————————————  —————————————— ————————— ——————— —————————————————
Eckhardt      Emily                   10 07-JUL-04  100000            100000
Newton        Donald                  10 24-SEP-06   80000            100000
James         Betsy                   10 16-MAY-07   60000            100000
Friedli       Roger                   10 16-MAY-07   60000            100000
Michaels      Matthew                 10 16-MAY-07   70000            100000
Dovichi       Lori                    10 07-JUL-11                    100000
peterson      michael                 20 03-NOV-08   90000             90000
leblanc       mark                    20 06-MAR-09   65000             90000
Jeffrey       Thomas                  30 27-FEB-10  300000            300000
Wong          Theresa                 30 27-FEB-10   70000            300000
Newton        Frances                    14-SEP-05   75000             75000


In the Lead and Lagging Behind

Its  common requirement to get access to record which is precedes or follow the current row .  By using the lead and lagging function one can obtain the side by side view of current row

SQL> select last_name, first_name, department_id, hire_date,    
  2         LAG(hire_date, 1, null) over (partition by department_id
  3                                 order by hire_date) prev_hire_date
  4    from employee
  5  order by department_id, hire_date, last_name, first_name;

LAST_NAME     FIRST_NAME              DEPARTMENT_ID HIRE_DATE PREV_HIRE
————————— —————————————— —————————————————————————— ————————— —————————
Eckhardt      Emily                              10 07-JUL-04
Newton        Donald                             10 24-SEP-06 07-JUL-04
Friedli       Roger                              10 16-MAY-07 24-SEP-06
James         Betsy                              10 16-MAY-07 16-MAY-07
Michaels      Matthew                            10 16-MAY-07 16-MAY-07
Dovichi       Lori                               10 07-JUL-11 16-MAY-07
peterson      michael                            20 03-NOV-08
leblanc       mark                               20 06-MAR-09 03-NOV-08
Jeffrey       Thomas                             30 27-FEB-10
Wong          Theresa                            30 27-FEB-10 27-FEB-10
Newton        Frances                               14-SEP-05

11 rows selected.

LAG(column | expression, offset, default)

Offset is a positive integer that defaults to a value of 1. This parameter tells the LAG function how many previous rows it should go back. A value of 1 means, “Look at the row immediately preceding the current row within the current window.” Default is the value you want to return if the offset value (index) is out of range for the current window. For the first row in a group, the default value will be returned.

RATIO_TO_RAPORT :
Business usage often need to report on percentage .sales  amounts, overall cost and annual  salries.
“What percentage of the total annual salary allotment does each employee receive?” The syntax for the RATIO_TO_REPORT analytic function is 
RATIO_TO_REPORT( column | expression)

Code Listing 9: Use RATIO_TO_REPORT to obtain the percentage of salaries 
SQL> select last_name, first_name, department_id, hire_date, salary, 
round(RATIO_TO_REPORT(salary) over ()*100, 2) sal_percentage
  2    from employee
  3  order by department_id, salary desc, last_name, first_name;

LAST_NAME      FIRST_NAME    DEPARTMENT_ID  HIRE_DATE  SALARY  SAL_PERCENTAGE
———————————  ————————————   —————————————— ——————————  ——————  ——————————————
Dovichi        Lori                     10  07-JUL-11
Eckhardt       Emily                    10  07-JUL-04  100000          10.31
Newton         Donald                   10  24-SEP-06   80000           8.25
Michaels       Matthew                  10  16-MAY-07   70000           7.22
Friedli        Roger                    10  16-MAY-07   60000           6.19
James          Betsy                    10  16-MAY-07   60000           6.19
peterson       michael                  20  03-NOV-08   90000           9.28
leblanc        mark                     20  06-MAR-09   65000            6.7
Jeffrey        Thomas                   30  27-FEB-10  300000          30.93
Wong           Theresa                  30  27-FEB-10   70000           7.22
Newton         Frances                      14-SEP-05   75000           7.73

 Note the analytical function in this query entire set of rows as window , because over doesn't specify any order by clause

or additional windowing clause.

Use RATIO_TO_REPORT to obtain the percentage of salaries, by department 
SQL> select last_name, first_name, department_id, hire_date, salary, 
round(ratio_to_report(salary)
  2         over(partition by department_id)*100, 2) sal_dept_pct
  3    from employee
  4  order by department_id, salary desc, last_name, first_name;

LAST_NAME      FIRST_NAME    DEPARTMENT_ID  HIRE_DATE  SALARY  SAL_DEPT_PCT
——————————  —————————————  ———————————————  —————————  ——————  ————————————
Dovichi        Lori                     10  07-JUL-11
Eckhardt       Emily                    10  07-JUL-04  100000        27.03
Newton         Donald                   10  24-SEP-06   80000        21.62
Michaels       Matthew                  10  16-MAY-07   70000        18.92
Friedli        Roger                    10  16-MAY-07   60000        16.22
James          Betsy                    10  16-MAY-07   60000        16.22
peterson       michael                  20  03-NOV-08   90000        58.06
leblanc        mark                     20  06-MAR-09   65000        41.94
Jeffrey        Thomas                   30  27-FEB-10  300000        81.08
Wong           Theresa                  30  27-FEB-10   70000        18.92
Newton         Frances                      14-SEP-05   75000          100

11 rows selected.
 
 
 In this query the analytical function use the row set from window created on departmentId 
aover defines the window on Department-id