1

I have a dataset with lead_id, lead_status, effective_datetime. The status can change a few times during one day. In the case where there are multple status entries in one day I need to DATE_ADD 1 Day to the 2nd entry, 2 Days to the 3rd entry, etc until I have one lead_id, lead_status, effective_day record. Here is what I have so far:

First I create a row number:

CREATE TABLE lead_rows AS
SELECT
    lead_id,
    effective_datetime,
    lead_status,
    (
        CASE lead_id
        WHEN @curLead 
        THEN @curRow := @curRow + 1 
        ELSE @curRow := 1 AND @curLead := lead_id END
    ) + 1 AS row_number
FROM lead_history, (SELECT @curRow := 0, @curLead := '') r
ORDER BY lead_id, effective_datetime;

Then I compare current row to previous row to figure if I need to move the current row date ahead by a day:

SELECT
    lead_id,
    lead_status,
    effective_datetime,
    CASE 
        WHEN DATE(a.effective_datetime) = (
            SELECT DATE(effective_datetime) 
            FROM lead_rows 
            WHERE row_number = (a.row_number - 1) AND lead_id = a.lead_id) 
        THEN DATE_ADD(DATE(a.effective_datetime), INTERVAL 1 DAY) 
        ELSE DATE(a.effective_datetime) 
    END as new_date 
FROM lead_rows a;

This works fine except when I have two status entries on adjacent days. Because this isn't recursive it won't continue to push subsequent status entries forward one day. So I could start with:

lead_id    | lead_status    | effective_datetime
---------------------------------------------------
lead_1     | Known          | 2014-01-01 01:00:00
lead_1     | In Contact     | 2014-01-01 14:00:00
lead_1     | Unqualified    | 2014-01-02 12:00:00

But end with:

lead_id    | lead_status    | effective_datetime    | new_date
----------------------------------------------------------------
lead_1     | Known          | 2014-01-01 01:00:00   | 2014-01-01
lead_1     | In Contact     | 2014-01-01 14:00:00   | 2014-01-02
lead_1     | Unqualified    | 2014-01-02 12:00:00   | 2014-01-02

When it should have been:

lead_id    | lead_status    | effective_datetime    | new_date
----------------------------------------------------------------
lead_1     | Known          | 2014-01-01 01:00:00   | 2014-01-01
lead_1     | In Contact     | 2014-01-01 14:00:00   | 2014-01-02
lead_1     | Unqualified    | 2014-01-02 12:00:00   | 2014-01-03

That last status entry needs to be pushed ahead by one day to account for the 2nd entry being pushed but I can't figure out to recursively do this. I've looked into running it through a Cursor but that seems overkill for this but maybe there's no other way...

Anyone have a suggestion on how to accomplish this? I'm running this in MySQL with the ability to set up procedures, but if someone has an option that won't work in MySQL then I can convert this to RedShift. Thanks in advance.

Patrick
  • 29,357
  • 6
  • 62
  • 90
walexnelson
  • 424
  • 2
  • 10
  • One idea is a calendar with all relevant dates (this is a depressingly small table) – Strawberry Dec 04 '14 at 08:31
  • I am going to be joining it to a calendar eventually because I'll need to show what the status was on any day since a lead can sit in one status for multiple days. I figured it would be easier to flatten this history data first (it has ~9,000 records is all) – walexnelson Dec 04 '14 at 15:30

1 Answers1

0

The problem is with how SQL carries out your +1 operation. SQL does not have to do the checks in the order entries are displayed to you..

I imagine an easier approach would be to do the check when entries are added (assuming this date conflict is rare) and have a trigger to modify the new entry as it is inserted to the table.

edit: Update int column in table with unique incrementing values seems to be a similar problem, basically you set a counter i and increment the table with same id same date by i starting from 0. The example in the solution uses the date in i instead.

Community
  • 1
  • 1
  • I agree that would be easier. Unfortunately I don't own the data. I only get a dump of the system and then i need to transform it for reporting. – walexnelson Dec 04 '14 at 15:27
  • If I were pressed for time I'd just create another table and transfer line by line with the trigger in place (since you only have 9000 entries). See my edit for a possible alternative – softwarenewbie7331 Dec 05 '14 at 01:52
  • I ended up writing to another table to adjust the dates with the trigger to flatten out the history records. Given more time I could tweak the solution in your edit to fit my needs though. Thanks! – walexnelson Dec 12 '14 at 08:56