2

I have a table in SQL Server where user is allowed to make changes to the employee's details. Every time a new record is placed in the EMPLOYEE_HIST table. Only the EMP_ID is kept constant for the employee, and all other details are modifiable.

Also there the is a SEQ_NO column which maintains the sequence of entries made.

EMPLOYEE_HIST:

SEQ_NO  EMP_ID   SOME_VAL1   SOME_VAL2
1       E1       V11          V21     (initial value of this employee)
2       E2       V12          V22     (initial value of this employee)
3       E3       V13          V23     (initial value of this employee)
4       E2       V00          V22
5       E1       V01          V21
6       E2       V02          V22
7       E4       V00          V00     (initial value of this employee)

I want a query which will give me changes made to particular employees, something like

EMP_ID    SOME_VAL1_OLD    SOME_VAL1_NEW    SOME_VAL2_OLD  SOME_VAL2_NEW
E1        V11              V01              V21            V21
E2        V12              V00              V22            V22
E2        V00              V02              V22            V22

UPDATE Also employee details may be modified by user n number of times and for each change, a row should be present in the result set. Please help.

EDIT: I finally settled with using LAG function. It will work like this:

SELECT *,ROW_NUMBER() OVER(PARTITION BY EMP_ID,CHANGE_NO ORDER BY EMP_ID,CHANGE_NO,SEQ_NO)  
FROM(
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1) 
OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL1 AS NEW_VAL, '1' AS CHANGE_NO) T 
WHERE OLD_VAL<>NEW_VAL UNION ALL
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1) OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL2 AS NEW_VAL, '2' AS CHANGE_NO) T
WHERE OLD_VAL<>NEW_VAL) TEMP

But the performance is terribly slow for fetching total 500 rows on the table containing 3 million records. Please give some suggestions to improve sorting cost.

3 Answers3

1

You can use a CTE with a Window function if you're using 2008 or newer:

;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
    , EMP_ID
    , SOME_VAL1
    , SOME_VAL2 
FROM EMPLOYEE_HIST
) 
SELECT e.EMP_ID
    , s2.SOME_VAL1 [SOME_VAL1_OLD]
    , s1.SOME_VAL1 [SOME_VAL1_NEW]
    , s2.SOME_VAL2 [SOME_VAL2_OLD]
    , s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN r AS s1 ON e.EMP_ID = s1.EMP_ID and s1.rank = 1 --the last change
LEFT JOIN r AS s2 ON e.EMP_ID = s2.EMP_ID and s2.rank = 2 --the second to last change

If you want all of the changes, not just the top two, then you should be able to do something like this:

;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
    , EMP_ID
    , SOME_VAL1
    , SOME_VAL2 
FROM EMPLOYEE_HIST
) 
SELECT e.EMP_ID
    , s2.SOME_VAL1 [SOME_VAL1_OLD]
    , s1.SOME_VAL1 [SOME_VAL1_NEW]
    , s2.SOME_VAL2 [SOME_VAL2_OLD]
    , s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN (r AS s1 --the change
    INNER JOIN r AS s2 ON s1.EMP_ID = s2.EMP_ID and s2.rank = s1.rank + 1) --previous value
ON e.EMP_ID = s1.EMP_ID

This should enumerate all changes until it encounters the original value.

Kateract
  • 822
  • 6
  • 15
  • This is quite useful, but I dont want comparision of rank 2 and rank 1. The intension is to compare rank 1 and 2, 2 and 3, 3 and 4 and so on for each EMP_ID. – Ajinkya Deshmukh Jun 27 '16 at 06:21
  • @AjinkyaDeshmukh I updated my answer with a second part and changed the names just a bit to make them more consistent and understandable, let me know if this helps you – Kateract Jun 27 '16 at 16:39
0

You may be better off with a different data model. You could have a table EMPLOYEE_HIST_OLD that contains the identical data structure. This would allow you to archive the former data (even with a timestamp and/or sequence number), keep the size of the EMPLOYEE_HIST table smaller and w/o data you would not reference regularly, etc. This would allow for a basic join statement between the two tables.

I would then suggest you use the timestamp of the EMPLOYEE_HIST_OLD records to find the most recent modifications, then join those records back to the current records. This will only present to you the changed records. You could limit the query on EMPLOYEE_HIST_OLD to simply return one record (most recent) if you like. SQL query to get most recent row for each instance of a given key

If you must stay within the same EMPLOYEE_HIST table for everything and use the sequence number approach you may wish to use a count() to find changed records for a particular Employee ID and return the values ORDERED by sequence number. You could also limit the query to employees with count > 1. You would then view the data vertically in the table, though. To parse the values into separate columns like VAR1_OLD and VAR1 essentially would require you to only read the last two values and make one record out of two. You lose the visibility of all the changes when trying to view the data horizontally. There could be more than one historical change. To view the records horizontally would require you to do some array manipulation outside of SQL after the data was returned from the query.

For info on counting: SQL query for finding records where count > 1

Community
  • 1
  • 1
0

You could use a CTE to get a partitioned row number, by EMP_ID. Then join that against itself where the row number is offset by 1.

;WITH PartitionedRows
AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY EMP_ID ORDER BY SEQ_NO) AS RowID, EMP_ID, SOME_VAL1,SOME_VAL2
    FROM EMPLOYEE_HIST
)
SELECT a.EMP_ID,b.SOME_VAL1 AS SOME_VAL1_OLD,a.SOME_VAL1 AS SOME_VAL1_NEW,b.SOME_VAL2 AS SOME_VAL2_OLD,a.SOME_VAL2  AS SOME_VAL2_NEW
FROM PartitionedRows a
LEFT JOIN PartitionedRows b ON a.EMP_ID = b.EMP_ID AND a.RowID = (b.RowID + 1)
WHERE b.RowID IS NOT NULL
TLaV
  • 389
  • 2
  • 4