10

I am wondering if there is a method to implement SQL analytic functions without using the inbuilt functions.

SELECT *,
    ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rownum,
    DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS denserank,
    RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk
FROM emp;
Dale K
  • 25,246
  • 15
  • 42
  • 71
Teja
  • 13,214
  • 36
  • 93
  • 155

3 Answers3

15

Here are the three equivalent expressions:

select emp.*,
       (select count(*)
        from emp emp2
        where emp2.dept_id = emp.dept_id and
              (emp2.salary > emp.salary or
               emp2.salary = emp.salary and emp2.emp_id <= emp.emp_id
              )
       ) as "row_number",
       (select 1 + count(*)
        from emp emp2
        where emp2.dept_id = emp.dept_id and
              emp2.salary > emp.salary 
              )
       ) as "rank",
       (select count(distinct salary)
        from emp emp2
        where emp2.dept_id = emp.dept_id and
              emp2.salary >= emp.salary
       ) as "dense_rank",
from emp;

This assumes the existence of an emp_id to make the rows unique for "row_number".

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    @teja . . . All three of these should work, regardless of whether or not there are duplicates. `row_number()` depends on the existence of a unique key in the table, as described in the answer. – Gordon Linoff Oct 21 '17 at 19:12
  • Is there an another solution to do this without use of co related sub query – Vijiy Feb 10 '19 at 14:50
  • 1
    @Vijiy . . . Questions should be asked as *questions* not in comments, along with sample data, desired results, and a database tag. – Gordon Linoff Feb 10 '19 at 22:01
  • I was looking for an authoritative answer to this precise question (since _Spark Structured Streaming_ does not support windowing/analytic functions on a non-timestamp column) . Lo and behold from the horse's mouth _Gordon Linoff_ – WestCoastProjects Apr 20 '19 at 22:25
  • Ouch- just realized the above use _correlated subqueries_ - which are _not_ supported in _spark structured streaming_ .. https://stackoverflow.com/questions/55777682/how-to-execute-non-timestamp-based-aggregations-in-spark-structured-streaming?noredirect=1&lq=1 – WestCoastProjects Apr 20 '19 at 23:21
3

This would work for all cases

select DEPT_ID, SALARY,

    (select count(*)+1  from emp r where r.SALARY>o.SALARY and r.dept_id=o.dept_id) **rank**,

    (select count(distinct SALARY )+1  from emp r where r.SALARY>o.SALARY and r.dept_id=o.dept_id) *d_rank*,

    (select count(*)+1  from (select x.*,rownum rn from ( select emp.* from emp  order by DEPT_ID asc,salary desc ) x) r where r.rn<o.rn and r.dept_id=o.dept_id) **rownumm**

from (select x.*,rownum rn from ( select emp.* from emp  order by DEPT_ID asc,salary desc ) x) o 

order by DEPT_ID,salary desc;

for rank:- calculated using (count of (values less than current rows)+1

for dense rank:- same as rank (count distinct value less than current rows)+1

row_number:- create the nested query by generating rownum for each row which will be distinct for all rows. Now on top of that do the same logic as rank (count of values greater than previous rownum (rownum of select subquery))+1

Dale K
  • 25,246
  • 15
  • 42
  • 71
Ashish Mishra
  • 510
  • 4
  • 18
2

You can do this with a correlated sub-query.

select dept_id,salary,
(select count(*) from emp e1 where e1.dept_id=e.dept_id and e1.salary>=e.salary) as rnum
from emp e

This works well when there are no ties.

Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
  • How can I implement rownum when there are ties? – Teja Oct 20 '17 at 20:25
  • Row number with ties is a dense rank. – Ross Bush Oct 20 '17 at 20:31
  • @RossBush How can it be a dense rank? If you use ROW_NUMBER() OVER ( PARTITION BY ORDER BY ) it would assign unique numbers even if there are dups.. – Teja Oct 20 '17 at 20:34
  • I meant if you ordered by salary within a department and gave each member of a department a row number writhing that order, however, you kept a placeholder for ties (each tie gets the same row number), then that is row_number with ties by salary and dept_id. – Ross Bush Oct 20 '17 at 20:44