Calculate salary difference between two rows in HIVE

Question

I have a table with below columns-

last_name,    first_name,  department,     salary

I want to calculate list of employees who receive a salary less than 100, compared to their immediate employee with higher salary in the same department. I went to below answer- Compute differences between succesive records in Hadoop with Hive Queries and tried but I think I am doing something wrong as I am new to HIVE.

Below is the query which I am running-

select last_name,first_name, salary from emp where 
100 = LEAD(salary,1) OVER(PARTITION BY department ORDER BY salary)-salary;

Please help me with the solution.

Kaushik Nayak · Answer 1 · 2017-11-12T13:12:19.837

0

Use a case expression.

 SELECT last_name,
       first_name,
       salary
FROM   (SELECT last_name,
               first_name,
               salary,
               CASE
                 WHEN 100 > LEAD(salary, 1)
                              OVER(
                                PARTITION BY department
                                ORDER BY salary) - salary THEN 1
                 ELSE 0
               END sal_flag
        FROM   emp)
WHERE  sal_flag = 1;

edited Nov 12 '17 at 13:12

answered Nov 12 '17 at 13:06

Kaushik Nayak

30,772
5
32
45

I ran this query, it is giving error as- FAILED: ParseException line 15:0 cannot recognize input near 'WHERE' 'sal_flag' '=' in subquery source – R.Gold Nov 12 '17 at 16:06
@R.Gold : It is difficult to simulate Hive queries so as to find what could be the issue. There are no online Hive simulators available. Syntactically the query is correct but cannot understand what is wrong with hive. So, you need to do trial and error and alterations to see if the query works in your environment. – Kaushik Nayak Nov 12 '17 at 16:16

score 0 · Answer 2 · answered Nov 12 '17 at 22:05

Hive enforces every sub query to be given a name. I have just added the name to Kaushik's query. Try this, it will work.

SELECT last_name,
       first_name,
       salary
FROM   (SELECT last_name,
               first_name,
               salary,
               CASE
                 WHEN 100 > LEAD(salary, 1)
                              OVER(
                                PARTITION BY department
                                ORDER BY salary) - salary THEN 1
                 ELSE 0
               END sal_flag
        FROM   employee) v
WHERE  sal_flag = 1;

I personally prefer using WITH clause as opposed to subquery as below. With clauses make the query more readable. Also, they produce better execution plan generally.

WITH sal_view 
AS (SELECT last_name,
               first_name,
               salary,
               CASE
                 WHEN 100 > LEAD(salary, 1)
                              OVER(
                                PARTITION BY department
                                ORDER BY salary) - salary THEN 1
                 ELSE 0
               END sal_flag
        FROM   employee) 
SELECT last_name,
       first_name,
       salary
FROM  sal_view
WHERE  sal_flag = 1;

score 0 · Answer 3 · answered Nov 13 '17 at 03:33

Try

    with temp as(
select last_name,
   first_name,
   department,
   salary,
   LEAD(salary, 1)
          OVER(                             PARTITION BY department
                    ORDER BY salary) as diff

    FROM   emp
    )

    select ast_name,
   first_name,
   department,
   salary
   from temp
    where diff >100

Calculate salary difference between two rows in HIVE

3 Answers3