MySql find closest date and the date preceeding and after

Question

So I've scoured the recesses of the interwebs, but I can't find if there's an easy way to do this.

I have a list of magazine dates in a MySql (I know) DB. I need to be able to find the date closest to a given date (there may not be an issue for "1962-5-22", so we need the closest date in the DB) BUT at the same time, get the NEXT date and the PREVIOUS date in the DB, so that I can populate the "NEXT" and "PREVIOUS" buttons on the page. ALL THE WHILE keeping in mind that there's an upper and lower limit to the dates. Whew.

Is this even possible with a relatively simple query?

I think it will be easier if you split it into two queries: one to find the closest date, another to find the dates before and after. You should be able to find questions that answer how to do each step. — Barmar, May 23 '16 at 20:25

score 1 · Answer 1 · answered May 24 '16 at 01:10

You're really complicating things by trying to do this in a single query. Something like this should work to get your nearest date:

SELECT * FROM magazines ORDER BY ABS(TIMESTAMPDIFF(SECOND, issuedate, '1962-05-22')) LIMIT 1;

And then you have a date and can get the other two:

SELECT id FROM magazines WHERE issuedate > $current_date LIMIT 1;
SELECT id FROM magazines WHERE issuedate < $current_date LIMIT 1;

The amount of complexity you remove from your code is likely to be more valuable than the small time savings you'd get by executing a single query.

EagleRainbow · Answer 2 · 2016-05-23T21:14:03.917

Tricky, challanging, but feasible :)

First of all, I tried to set up your scenario as slim as possible with the following statements:

create table issues (
   issueon date not null primary key 
);

insert into issues values('2015-01-10'), ('2015-01-15'), ('2014-10-25');
insert into issues values('2015-01-08'), ('2015-01-20'), ('2014-1-10');

(Without loss of generality, I am only doing this for a single magazine).

Here's the complete SELECTstatement with which I came up after some testing with assuming that I am trying to detect the current, the previous and the next issue date around '2015-01-11':

select tableNext.theIssue, theNext, thePrevious
from 
(
    select basis.issueon as theIssue, target.issueon as thePrevious, datediff(basis.issueon, target.issueon) as surroundingdistance
    from issues as basis, issues as target
    where basis.issueon = (
        select closestissue from ( 
            select issueon as closestissue, abs(datediff('2015-01-11', issueon)) as distance from issues order by distance asc limit 1 
        ) as theissue
    ) and datediff(basis.issueon, target.issueon) > 0 
    order by surroundingdistance limit 1
) as tableNext
left join 
(
    select basis.issueon as theIssue, target.issueon as theNext, datediff(basis.issueon, target.issueon) as surroundingdistance
    from issues as basis, issues as target
    where basis.issueon = (
        select closestissue from ( 
            select issueon as closestissue, abs(datediff('2015-01-11', issueon)) as distance from issues order by distance asc limit 1 
        ) as theissue
    ) and datediff(basis.issueon, target.issueon) < 0 
    order by surroundingdistance desc limit 1
) as tablePrevious
on tableNext.theIssue = tableNext.theIssue
;

Don't panic - it's a monster and I'll try to explain it step by step:

Innerst Statement -- First Layer

Let's first have a look at the inner-most select statement:

select issueon as closestissue, abs(datediff('2015-01-11', issueon)) as distance 
from issues order by distance asc limit 1

The statement computes the "distance" between the dates of all issues and the date where we started. To ensure that we are looking both into the forward and into the backward direction, the absolute value of the distance is taken. Using order by distance limit 1 we are determining the "nearest match" (NB: This could also be done using GROUP BY and HAVING clauses, but this is much less liked by MySQL than LIMIT 1). The value that is "the best match" then can be found in attribute closestissue.

Second layer

Now let's have a look at

select closestissue from ( 
            select issueon as closestissue, abs(datediff('2015-01-11', issueon)) as distance from issues order by distance asc limit 1 
        ) as theissue

With this one, we are just stripping away the attribute distance as it will be hindering us in a second

Third layer

On the third layer, we are taking the result of the second layer and use that one as input on a WHERE condition to select the "right" issue from the issues table again:

select basis.issueon as theIssue, target.issueon as theNext, datediff(basis.issueon, target.issueon) as surroundingdistance
    from issues as basis, issues as target
    where basis.issueon = (
        select closestissue from ( 
            select issueon as closestissue, abs(datediff('2015-01-11', issueon)) as distance from issues order by distance asc limit 1 
        ) as theissue
    )

Afterwards, we are joining the same table again (self-join) in a cartesian product. Note, that the left table of the join only may contain a single record (assuming that there is only one issue per day, which I hope is safe). Therefore, this cartesian product is not risky, as it will only return as many records as we anyhow have in the table issues (so we stay in complexity O(n)). The free hand-side of this right (the "right side") is called target in the query.

We are now able to compare the "distance" from the "proper issue date" with all dates in the table again. We do so by calculating datediff(basis.issueon, target.issueon) as surroundingdistance again.

Previous and Next

Once we have the sourroundingdistance it's getting easy to apply the same pattern as on the first layer. However, this time we may not just take the absolute distance, but we have to look at two distinct sets:

Those which were earlier (sourroundingdistance < 0), and
those which were later (sourroundingdistance > 0)

Doing the same trick now again with ORDER BY + LIMIT 1, we can isolate a single issue. However, as we have to consider those two sets seperately, we have to execute the same query (with some minor modifications) twice -- that's why that beast gets so long-ish.

Putting it all together

We now have two queries, which

contain the corresponding issue date, and
indicate the previous respectively the next issue date.

The common attribute for these two queries is the corresponding issue date. Therefore, we can join these two queries with each other via this common attribute.

And voilà: The "monster" query provides the following desired result:

theIssue    theNext     thePrevious
2015-01-10  2015-01-15  2015-01-08

Amendment

Though I had a brief look at the execution plan (EXPLAIN) of the query and was astonished that the complexity still was quite limited, from a maintenance perspective such a query is a nightmare.

If you are able and you may afford to execute multiple statements sequentially, there might be better approaches than firing such a huge query to the server. For example, you could fire a first query containing the content of the first layer, getting the result and using that one as input for a second query, with which you are determining the "next issue". A third query could retrieve the "previous issue".

Also, if your environment permits it, you may consider writing a function (CREATE FUNCTION) for it, which encapsulates all this in a more sequential way.

All these are just variants of the same approach. In the end, I hope I could equip you with the necessary "weapons" to tackle this issue using SQL.

Kudos go to http://stackoverflow.com/questions/13026334/get-closest-date-from-mysql-table — EagleRainbow, May 23 '16 at 21:16