I have a dataset with lead_id, lead_status, effective_datetime. The status can change a few times during one day. In the case where there are multple status entries in one day I need to DATE_ADD 1 Day to the 2nd entry, 2 Days to the 3rd entry, etc until I have one lead_id, lead_status, effective_day record. Here is what I have so far:
First I create a row number:
CREATE TABLE lead_rows AS
SELECT
lead_id,
effective_datetime,
lead_status,
(
CASE lead_id
WHEN @curLead
THEN @curRow := @curRow + 1
ELSE @curRow := 1 AND @curLead := lead_id END
) + 1 AS row_number
FROM lead_history, (SELECT @curRow := 0, @curLead := '') r
ORDER BY lead_id, effective_datetime;
Then I compare current row to previous row to figure if I need to move the current row date ahead by a day:
SELECT
lead_id,
lead_status,
effective_datetime,
CASE
WHEN DATE(a.effective_datetime) = (
SELECT DATE(effective_datetime)
FROM lead_rows
WHERE row_number = (a.row_number - 1) AND lead_id = a.lead_id)
THEN DATE_ADD(DATE(a.effective_datetime), INTERVAL 1 DAY)
ELSE DATE(a.effective_datetime)
END as new_date
FROM lead_rows a;
This works fine except when I have two status entries on adjacent days. Because this isn't recursive it won't continue to push subsequent status entries forward one day. So I could start with:
lead_id | lead_status | effective_datetime
---------------------------------------------------
lead_1 | Known | 2014-01-01 01:00:00
lead_1 | In Contact | 2014-01-01 14:00:00
lead_1 | Unqualified | 2014-01-02 12:00:00
But end with:
lead_id | lead_status | effective_datetime | new_date
----------------------------------------------------------------
lead_1 | Known | 2014-01-01 01:00:00 | 2014-01-01
lead_1 | In Contact | 2014-01-01 14:00:00 | 2014-01-02
lead_1 | Unqualified | 2014-01-02 12:00:00 | 2014-01-02
When it should have been:
lead_id | lead_status | effective_datetime | new_date
----------------------------------------------------------------
lead_1 | Known | 2014-01-01 01:00:00 | 2014-01-01
lead_1 | In Contact | 2014-01-01 14:00:00 | 2014-01-02
lead_1 | Unqualified | 2014-01-02 12:00:00 | 2014-01-03
That last status entry needs to be pushed ahead by one day to account for the 2nd entry being pushed but I can't figure out to recursively do this. I've looked into running it through a Cursor but that seems overkill for this but maybe there's no other way...
Anyone have a suggestion on how to accomplish this? I'm running this in MySQL with the ability to set up procedures, but if someone has an option that won't work in MySQL then I can convert this to RedShift. Thanks in advance.