4

So I was taking a test recently with some higher level SQL problems. I only have what I would consider "intermediate" experience in SQL and I've been working on this for a day or so now. I just can't figure it out.


Here's the problem:

You have a table with 4 columns as such:

EmployeeID     int unique
EmployeeType   int  
EmployeeSalary int  
Created        date

Goal: I need to retrieve the difference between the latest two EmployeeSalary for any EmployeeType with more than 1 entry. It has to be done in one statement (nested queries are fine).


Example Data Set: http://sqlfiddle.com/#!9/0dfc7

EmployeeID | EmployeeType | EmployeeSalary | Created
-----------|--------------|----------------|--------------------
1          | 53           | 50             | 2015-11-15 00:00:00
2          | 66           | 20             | 2014-11-11 04:20:23
3          | 66           | 30             | 2015-11-03 08:26:21
4          | 66           | 10             | 2013-11-02 11:32:47
5          | 78           | 70             | 2009-11-08 04:47:47
6          | 78           | 45             | 2006-11-01 04:42:55

So for this data set, the proper return would be:

EmployeeType | EmployeeSalary 
-------------|---------------
66           | 10
78           | 25

The 10 comes from subtracting the latest two EmployeeSalary values (30 - 20) for the EmployeeType of 66. The 25 comes from subtracting the latest two EmployeeSalary values (70-45) for EmployeeType of 78. We skip EmployeeID 53 completely because it only has one value.

This one has been destroying my brain. Any clues?

Thanks!

SalsaGuy
  • 43
  • 4

2 Answers2

3

How to make really simple query complex?

One funny way(not best performance) to do it is:

SELECT final.EmployeeType, SUM(salary) AS difference
FROM (
  SELECT b.EmployeeType, b.EmployeeSalary AS salary
  FROM tab b
  JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
        FROM tab
        GROUP BY EmployeeType
        HAVING COUNT(*) > 1) AS sub
    ON b.EmployeeType = sub.EmployeeType
    AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 1
  UNION ALL
  SELECT b.EmployeeType, -b.EmployeeSalary AS salary
  FROM tab b
  JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
        FROM tab
        GROUP BY EmployeeType
        HAVING COUNT(*) > 1) AS sub
    ON b.EmployeeType = sub.EmployeeType
    AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 2
) AS final
GROUP BY final.EmployeeType;

SqlFiddleDemo

EDIT:

The keypoint is MySQL doesn't support windowed function so you need to use equivalent code:

For example solution in SQL Server:

SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary 
                                 ELSE -EmployeeSalary END) AS difference
FROM (SELECT *,
       ROW_NUMBER() OVER(PARTITION BY EmployeeType ORDER BY Created DESC) AS rn
      FROM #tab
     ) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1

LiveDemo

And MySQL equivalent:

SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary 
                          ELSE -EmployeeSalary END) AS difference
FROM (
       SELECT t1.EmployeeType, t1.EmployeeSalary,
        count(t2.Created) + 1 as rn
      FROM #tab t1
      LEFT JOIN #tab t2
        ON t1.EmployeeType = t2.EmployeeType
       AND t1.Created < t2.Created
      GROUP BY t1.EmployeeType, t1.EmployeeSalary
     ) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1;

LiveDemo2

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • Hey lad2025! I'm also getting timeouts on fiddle, but I tested the code locally and it worked! Thanks for all your help. I'm going to comb through it now to try and understand what I missed. You're the best! – SalsaGuy Nov 23 '15 at 18:31
  • @SalsaGuy This code is for fun, you should not use it in production because there exists better ways (magnitude faster) :) – Lukasz Szozda Nov 23 '15 at 18:32
  • I've been doing software dev for about a decade and didn't do much SQL until recently. I didn't know SQL could even be this difficult. When I started doing challenges I was amazed at how powerful it is. I'll use your code to learn what's goin on and try to optimize it. Thanks again for your help! – SalsaGuy Nov 23 '15 at 18:45
  • You might want to look up how to emulate LEAD/LAG in mysql to optimize your query. [This](http://stackoverflow.com/questions/11303532/simulate-lag-function-in-mysql) is one example. – Julien Blanchard Nov 23 '15 at 18:53
  • @JulienBlanchard Yes, this can be optimized in many ways, the quickest is to emulate by `ROW NUMBER`. I don't like MySQL because it lacks of windowed function. **[SQL Server Demo](https://data.stackexchange.com/stackoverflow/query/397237)** Working on emulation :) – Lukasz Szozda Nov 23 '15 at 18:54
  • nice job lad as always – Drew Nov 23 '15 at 19:27
3

The dataset of the fiddle is different from the example above, which is confusing (not to mention a little perverse). Anyway, there's lots of ways to skin this particular cat. Here's one (not the fastest, however):

SELECT a.employeetype, ABS(a.employeesalary-b.employeesalary) diff
  FROM 
     ( SELECT x.*
            , COUNT(*) rank 
         FROM employees x 
         JOIN employees y 
           ON y.employeetype = x.employeetype 
          AND y.created >= x.created 
        GROUP
           BY x.employeetype
            , x.created
     ) a
  JOIN
     ( SELECT x.*
            , COUNT(*) rank 
         FROM employees x 
         JOIN employees y 
           ON y.employeetype = x.employeetype 
          AND y.created >= x.created 
        GROUP
           BY x.employeetype
            , x.created
     ) b
    ON b.employeetype = a.employeetype
   AND b.rank = a.rank+1
 WHERE a.rank = 1;

a very similar but faster solution looks like this (although you sometimes need to assign different variables between tables a and b - for reasons I still don't fully understand)...

SELECT a.employeetype
     , ABS(a.employeesalary-b.employeesalary) diff
  FROM 
     ( SELECT x.* 
            , CASE WHEN @prev = x.employeetype THEN @i:=@i+1 ELSE @i:=1 END i
            , @prev := x.employeetype prev
         FROM employees x
            , (SELECT @prev := 0, @i:=1) vars
        ORDER 
           BY x.employeetype
            , x.created DESC
     ) a
  JOIN
     ( SELECT x.* 
            , CASE WHEN @prev = x.employeetype THEN @i:=@i+1 ELSE @i:=1 END i
            , @prev := x.employeetype prev
         FROM employees x
            , (SELECT @prev := 0, @i:=1) vars
        ORDER 
           BY x.employeetype
            , x.created DESC
     ) b
    ON b.employeetype = a.employeetype
   AND b.i = a.i + 1
 WHERE a.i = 1;
Strawberry
  • 33,750
  • 13
  • 40
  • 57