0

I have a tricky situation.

I have a source dataset; it has data for four employees and their departments based on an effective date.

I need to convert this source dataset to the destination dataset.

enter image description here

Both datasets are properly sorted by EmployeeName and EffectiveDate (ASC) order.

(Please see T-SQL scripts using temp table.)

CREATE TABLE #source

(

EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100)

);

INSERT INTO #source

VALUES

('Lisa','2017-06-25','Catering'),
('Lisa','2018-08-17',NULL),
('Lisa','2021-12-05','Gardening'),
('Melissa','2015-08-27',NULL),
('Melissa','2017-11-29','Office'),
('Melissa','2020-10-10','Driving'),
('Melissa','2022-07-11',NULL),
('Omar','2019-01-03',NULL),
('Omar','2020-04-07','Retail'),
('Omar','2021-03-29',NULL),
('Pat', '2012-09-12','Laundry'),
('Pat', '2013-10-30',NULL),
('Pat', '2014-11-29',NULL),
('Pat', '2015-08-16',NULL),
('Pat', '2016-11-05',NULL)


CREATE TABLE #destination

(

EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100),
PreviousNonNULLDepartmentIfAvailable varchar(100)

);

INSERT INTO #destination

VALUES

('Lisa','2017-06-25','Catering',NULL),
('Lisa','2018-08-17',NULL,'Catering'),
('Lisa','2021-12-05','Gardening','Catering'),
('Melissa','2015-08-27',NULL,NULL),
('Melissa','2017-11-29','Office',NULL),
('Melissa','2020-10-10','Driving','Office'),
('Melissa','2022-07-11',NULL,'Driving'),
('Omar','2019-01-03',NULL,NULL),
('Omar','2020-04-07','Retail',NULL),
('Omar','2021-03-29',NULL,'Retail'),
('Pat', '2012-09-12','Laundry',NULL),
('Pat', '2013-10-30',NULL,'Laundry'),
('Pat', '2014-11-29',NULL,'Laundry'),
('Pat', '2015-08-16',NULL,'Laundry'),
('Pat', '2016-11-05',NULL,'Laundry')



SELECT *
FROM #source
ORDER BY EmployeeName, EffectiveDate


SELECT *
FROM #destination
ORDER BY EmployeeName, EffectiveDate

In the destination dataset, I need one new column called [PreviousNonNULLDepartmentIfAvailable].

What is the logic to derive this above new column?

I need to get each individual's most recent (previous) department; it is easy to use a LAG function to get the most recent (previous) department. See T-SQL code below:

PreviousNonNULLDepartmentIfAvailable = LAG(CurrentDepartment) OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate)

However, I need the most recent (previous) non-NULL department; if there is no such "most recent (previous) non-NULL" department value within the PARTITION of EmployeeName, then I need to show NULL.

I have tried options such as LAG, LAST_VALUE, IGNORE NULLS clause, UNBOUNDED PRECEDING clause. These options are close to what I need, but NOT exactly what I need.

Effectively, I need to get what a LAG function would perform; but the offset value for LAG function has to be dynamic, instead of a static value such as 1 or 2 or 3...; the LAG function needs to iterate (backwards) as many rows as needed to catch the most recent (previous) non-NULL department value, within a PARTITION of EmployeeName.

This said, the column [PreviousNonNULLDepartmentIfAvailable] can still have NULL values, if there is no such "most recent (previous) non-NULL" department value available within a PARTITION of EmployeeName.

Also, the first row based on ascending order of Effective Date of each partition of EmployeeName will always have NULL as its [PreviousNonNULLDepartmentIfAvailable] value (obviously). This is natural in the way LAG function works.

Any idea on how to convert the source dataset to destination dataset ?

user3812887
  • 439
  • 12
  • 33

2 Answers2

0

A combination of LAG, ROW_NUMBER, a Flag to check for CurrentDepartment with NULL values, a subquery to get the first row number with a CurrentDepartment that IS NOT NULL (based on RN and Flag), and a CASE statement to check if the Lag_Value IS NULL should do the job here:

WITH CTE AS (SELECT
               EmployeeName, EffectiveDate, CurrentDepartment,
               LAG(CurrentDepartment) OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate) AS Lag_Value,
               ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate) AS RN,
               CASE WHEN CurrentDepartment IS NOT NULL THEN 1 ELSE 0 END AS Flag 
             FROM #source
)
SELECT EmployeeName, EffectiveDate, CurrentDepartment,
  CASE 
    WHEN Lag_Value IS NULL THEN PreviousDepartment
    ELSE Lag_Value 
  END AS PreviousNonNULLDepartmentIfAvailableFROM 
FROM (SELECT *,
  (SELECT TOP(1) CurrentDepartment FROM CTE WHERE Flag = 1 
  AND RN < a.RN AND EmployeeName = a.EmployeeName) AS PreviousDepartment
  FROM CTE a) a

Fiddle here.

UPDATED Fiddle here.

griv
  • 2,098
  • 2
  • 12
  • 15
  • thanks very much; i will test it and get back on Friday; it looks perfect. – user3812887 Oct 28 '22 at 02:25
  • Would using a MAX function instead of MIN function change the output ? (This is just to understand the concept of PARTITION BY and ORDER BY inside a MIN function, something that I don't do) – user3812887 Oct 28 '22 at 03:50
  • Good question, its based on string length, so if you swap with `MAX` it will not work. Check my answer against your live data, and see if it works because if you swap `Gardening` with `Catering` for `Lisa`'s data, you may not get the results that you wanted. If you're only working with the supplied sample data, this will work though. I will look into it for you. – griv Oct 28 '22 at 16:03
  • I already checked with MIN and MAX, your code (MIN) is right; just wanted to understand the concept – user3812887 Oct 28 '22 at 16:16
  • Gotcha. Here's a good previous question on the topic: [SQL SERVER - Understanding how MIN(text) works](https://stackoverflow.com/questions/4503703/sql-server-understanding-how-mintext-works). – griv Oct 28 '22 at 16:29
  • the code gives wrong output in the actual database; it looks like we are depending on the alphabetical order of CurrentDepartment, to determine the [Min_Value] – user3812887 Oct 28 '22 at 19:50
  • In the sample I have put here, Lisa's [Min_Value] on Dec 5, 2021 is Catering; while in the actual database (where there are many departments), I see the [Min_Value] as Gardening, which is the Current Department – user3812887 Oct 28 '22 at 19:54
  • Gotcha. That’s what I figured (as I mentioned above). I’ll see if I can try to work on it more, or maybe you’ll get a better answer in the meantime. – griv Oct 28 '22 at 19:54
  • thanks very much; the logic should NOT be have any dependency on the string value of Current Department (this is one rule, I think we need to keep in mind); say, a string might have 10 alphabets, another might have 5 or 20.... one might start with 'A', the other with 'B', another with 'Z'; the code should be fool-proof, regardless of what the string value is for the Current Department column. – user3812887 Oct 28 '22 at 19:57
  • Can we use something like RANK() or ROWNUMBER ? – user3812887 Oct 28 '22 at 20:00
  • I've added `ROW_NUMBER()` and a subquery (as well as an updated fiddle) to account for this problem. See my updated answer. Check it out and let me know if it works. – griv Oct 28 '22 at 20:56
  • I have figured out; thanks for your help. https://dbfiddle.uk/uOQZQKre; https://stackoverflow.com/questions/67179501/sql-show-the-previous-value-until-the-given-value-changes – user3812887 Oct 29 '22 at 00:05
  • OUTER APPLY AND TOP 1 DESC is the technique I used to fix the issue; but appreciate your effort very much – user3812887 Oct 29 '22 at 00:06
  • Does my solution work as well? Just curious. – griv Oct 29 '22 at 00:46
  • I did not try your solution that has ROW NUMBER; I will try on Monday; in the mean time, I have posted a solution – user3812887 Oct 29 '22 at 02:57
  • Nice. I like your solution. Good idea with the `CROSS APPLY` looks like they both work: https://dbfiddle.uk/GcKz5XCZ – griv Oct 29 '22 at 03:19
0
SELECT

a.*,

c.PreviousNonBlankCurrentDepartment AS PreviousNonNULLDepartmentIfAvailable

FROM #source a

OUTER APPLY

(

SELECT TOP 1 b.CurrentDepartment AS PreviousNonBlankCurrentDepartment

FROM #source b
WHERE b.EmployeeName = a.EmployeeName
AND b.EffectiveDate < a.EffectiveDate
AND b.CurrentDepartment IS NOT NULL

ORDER BY b.EffectiveDate DESC

) c
user3812887
  • 439
  • 12
  • 33