1

I have a table of customer visits.

For each each customer, I want to compare the first record's post_vst_dt to the vst_beg_dt of the second record.

If the second visit record's vst_beg_dt is greater than the first record's post_vst_dt then the both records need to be selected else the first record's post_vst_dt has to be compared to the third record's vst_beg_dt.

Based on the same conditional check either both the first and third record has to be selected or just the first record.

This needs to be checked for all customers and each customer can have any number of visits.

Can this be accomplished using SQL or a mysql procedure with loop and cursor be use. Can someone help please.

| Visit_ID | Cust_ID | Vst_Beg_Dt | Vst_End_Dt | Post_vst_Dt |
|----------+---------+------------+------------+-------------|
|      445 |     545 | 2015-11-23 | 2015-11-28 | 2016-01-12  |
|      198 |     545 | 2016-01-07 | 2016-01-12 | 2016-02-26  |
|      271 |     545 | 2016-01-19 | 2016-01-29 | 2016-03-14  |
|      841 |     291 | 2015-09-08 | 2015-09-12 | 2015-10-27  |
|      987 |     291 | 2015-12-19 | 2015-12-23 | 2016-02-06  |
|      211 |     291 | 2015-12-26 | 2016-01-06 | 2016-02-20  |
|      122 |     291 | 2016-02-25 | 2016-02-27 | 2016-04-12  |

The expected output is

| Visit_ID | Cust_ID | Vst_Beg_Dt | Vst_End_Dt |
|----------+---------+------------+------------|
|      445 |     545 | 2015-11-23 | 2015-11-28 |
|      271 |     545 | 2016-01-19 | 2016-01-29 |
|      841 |     291 | 2015-09-08 | 2015-09-12 |
|      987 |     291 | 2015-12-19 | 2015-12-23 |
|      122 |     291 | 2016-02-25 | 2016-02-27 |

By 'first', I mean the record with the min vst_beg_dt, per customer

Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • What record should be considered the first, and what - the second? – user4035 May 17 '16 at 20:20
  • Why is Post_vst_Dt stored in a string format and not as a date? – user4035 May 17 '16 at 20:22
  • first record is the one with the min vst_beg_dt; first, second, third record for a customer are ordered by vst_beg_dt – user5104018 May 17 '16 at 20:24
  • the Post_vst_Dt date format can be changed, in the sample it is an error – user5104018 May 17 '16 at 20:25
  • http://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column?rq=1 shows how to find the first row for each customer – Barmar May 17 '16 at 20:51
  • Is it always comparing with either the 2nd or 3rd visit, or should it keep repeating until it finds the first visit whose `vst_beg_dt` is different from the first one? – Barmar May 17 '16 at 20:52
  • @Barmar - Thanks. I am looking into the link you shared, I have tried correlated subqueries but it doesn't seem to work. – user5104018 May 18 '16 at 12:36
  • The comparison must keep repeating until all records of every customer where the vst_beg_dt is greater than the post_vst_dt of the previous record are fetched. The first record for every customer is always selected, but the further visits are selected only if the vst_beg_dt is greater than the post_vst_dt of the previously selected record. – user5104018 May 18 '16 at 12:46
  • For example, for cust_id 291, 1st visit record is selected, the 2nd visit record is selected since 2015-12-19 is > than 2015-10-27. Now the 2nd visit record must be compared against the 3rd record and since the condition fails 3rd record is dropped and 2nd and 4th must be compared. – user5104018 May 18 '16 at 12:46

1 Answers1

1

I think this is a similar question to How to do the Recursive SELECT query in MySQL?.

I like leftclickben's answer to that problem. Here's a reworking of that query that I think works for your situation.

SELECT
    v.visit_id, @cust := v.cust_id cust_id, v.vst_beg_dt, v.vst_end_dt,
    @post := v.post_vst_dt post_vst_dt
FROM (SELECT * FROM visit ORDER BY cust_id, vst_beg_dt) v
JOIN (SELECT @cust := 0, @post := NULL) tmp
WHERE v.cust_id != @cust OR v.vst_beg_dt > @post;

Essentially you order the results so that each customer's visits are together and in date order. Then on each row update the @cust and @post variables to the cust_id and the post_vst_dt respectively. Finally filter the results to only include rows which either do not have the same cust_id as the previous row (the earliest entry for a customer) or have a vst_beg_dt greater than the previous post_vst_dt.

I'm actually a little bit surprised that this works but it seems to return the correct values for your sample data.

Community
  • 1
  • 1
Matt Raines
  • 4,149
  • 8
  • 31
  • 34
  • Thank you so much Matt! This does work. I was thinking that a cursor has to be used to loop through. My only question would if there is more than 1 key other that cust id, how will you need to modify ? – user5104018 May 20 '16 at 13:14
  • I'm not sure. Can you edit your question with the specifics of the other key column? – Matt Raines May 20 '16 at 13:15
  • Ok will do. If possible can you please explain how this simple query works recursively? Esp. I am not able to understand the where clause with or condition. Thanks. – user5104018 May 20 '16 at 14:08
  • As I understand it, it's not really recursive. As it iterates over the selection, for each row it selects, it sets `@cust` to that row's `cust_id` and `@post` to that row's `post_vst_dt`. Then the `WHERE` clause filters out the subsequent record if it has the same `cust_id` and the `vst_beg_dt` is earlier than the previously selected row's `post_vst_dt`. The initial `SELECT @cust := 0, @post := NULL` "table" is required because when the `WHERE` clause first runs, there won't be a previous row. Just make sure no customers have `cust_id` of `0`! – Matt Raines May 20 '16 at 15:33
  • No problem. It's nice to hear I could help with your problem, and I learned something new to boot, which is always awesome. The established way to say "thanks" at Stack Overflow is to upvote or accept the answer. You can only accept one answer to each of your questions, which removes it from the "unanswered" questions list and gives the user who posted the answer some reputation points. – Matt Raines May 20 '16 at 16:26
  • Yes, I did try upvote your answer but I get a message saying ''Once you earn a total of 15 reputation, your votes will change the publicly displayed post score'. I am not sure what that means. – user5104018 May 20 '16 at 16:54
  • Ah, fair enough. [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers) does say that you have to have sufficient reputation to upvote, I'd forgotten that. Not to worry, I'm glad I was able to help. Keep posting interesting questions :) – Matt Raines May 20 '16 at 17:02