7

Consider the following table:

CREATE TABLE foo (
    id INT PRIMARY KEY,
    effective_date DATETIME NOT NULL UNIQUE
)

Given a set of dates D, how do you fetch all rows from foo whose effective_date is the greatest value less than each date in D in a single query?

For simplicity, assume that each date will have exactly one matching row.

Suppose foo has the following rows.

---------------------
| id |effective_date|
---------------------
|  0 |    2013-01-07|
---------------------
|  1 |    2013-02-03|
---------------------
|  2 |    2013-04-19|
---------------------
|  3 |    2013-04-20|
---------------------
|  4 |    2013-05-11|
---------------------
|  5 |    2013-06-30|
---------------------
|  6 |    2013-12-08|
---------------------

If you were given D = {2013-02-20, 2013-06-30, 2013-12-19}, the query should return the following:

---------------------
| id |effective_date|
---------------------
|  1 |    2013-02-03|
|  4 |    2013-05-11|
|  6 |    2013-12-08|

If D had only one element, say D = {2013-06-30}, you could just do:

SELECT *
FROM foo
WHERE effective_date = SELECT MAX(effective_date) FROM foo WHERE effective_date < 2013-06-30 

How do you generalize this query when the size of D is greater than 1, assuming D will be specified in an IN clause?

Tom Tucker
  • 11,676
  • 22
  • 89
  • 130
  • Which db are you using? – Bohemian Dec 04 '13 at 22:40
  • I'm using MySQL, but if there is a non-DB specific solution, that'd be great, too. – Tom Tucker Dec 04 '13 at 22:41
  • I added another statement in my update, would you mind taking a look at that? – penjepitkertasku Dec 06 '13 at 01:03
  • Where does `D` come from? Also from database or from script / application? – Ja͢ck Dec 06 '13 at 09:24
  • 1
    To sum up the basic approach, which some of the answers follow, whether consciously or not: if you want to pass a *list* and get a *set* out of it, you need either to pass the list in the form of a set or to convert the list to a set in the process. For you need to have a set to generate another set from it. It's easier with a single value simply because a single value is easily re-usable. A list in this regard is quite another matter. Anyway, SQL is all about sets (primarily), so sets is what you need to ultimately work with here, not lists. – Andriy M Dec 06 '13 at 13:37

3 Answers3

3

Actually, your problem is - that you have a list of values, which will be treated in MySQL as row - and not as a set - in most cases. That is - one of possible solutions is to generate your set properly in application so it will look like:

SELECT '2013-02-20'
UNION ALL
  SELECT '2013-06-30'
UNION ALL
  SELECT '2013-12-19'

-and then use produced set inside JOIN. Also, that will be great, if MySQL could accept static list in ANY subqueries - like for IN keyword, but it can't. ANY also expects rows set, not list (which will be treated as row with N columns, where N is count of items in your list).

Fortunately, in your particular case your issue has important restriction: there could be no more items in list, than rows in your foo table (it makes no sense otherwise). So you can dynamically build that list, and then use it like:

SELECT 
  foo.*, 
  final.period 
FROM 
  (SELECT 
    period, 
    MAX(foo.effective_date) AS max_date 
  FROM 
    (SELECT 
      period 
    FROM 
      (SELECT 
        ELT(@i:=@i+1, '2013-02-20', '2013-06-30', '2013-12-19') AS period 
      FROM 
        foo 
          CROSS JOIN (SELECT @i:=0) AS init) AS dates 
      WHERE period IS NOT NULL) AS list 
      LEFT JOIN foo 
        ON foo.effective_date<list.period 
    GROUP BY period) AS final 
    LEFT JOIN foo 
      ON final.max_date=foo.effective_date

-your list will be automatically iterated via ELT(), so you can pass it directly to query without any additional restructuring. Note, that this method, however, will iterate through all foo records to produce row set, so it will work - but doing the stuff in application may be more useful in terms of performance.

The demo for your table can be found here.

Alma Do
  • 37,009
  • 9
  • 76
  • 105
  • Or instead of `foo` + variable assignment one could just use a [numbers table](http://stackoverflow.com/questions/9751318/creating-a-numbers-table-in-mysql "Creating a “Numbers Table” in mysql"). – Andriy M Dec 06 '13 at 11:38
  • Well, I'm aware, but using original table will be a bit simpler (an I've explained above why it can be used) – Alma Do Dec 06 '13 at 11:47
  • 1
    By the way, it would be nice if you elaborated how ANY could be used here. Let's assume for a moment that the application *can* pass the list of values as a dynamically built `(SELECT value1 UNION ALL SELECT value2 UNION ALL ...)` construct. How would you use ANY to produce the required result? My guess is it is not possible without re-using the value set, but perhaps you've got a solution that could prove me wrong. – Andriy M Dec 06 '13 at 13:13
2

perhaps this can help :

SELECT *
FROM foo
WHERE effective_date IN
(
    (SELECT MAX(effective_date) FROM foo WHERE effective_date < '2013-02-20'),
    (SELECT MAX(effective_date) FROM foo WHERE effective_date < '2013-06-30'),
    (SELECT MAX(effective_date) FROM foo WHERE effective_date < '2013-12-19')
)

result :

---------------------
| id |effective_date|
---------------------
|  1 |    2013-02-03| -- different
|  4 |    2013-05-11|
|  6 |    2013-12-08|

UPDATE - 06 December


create procedure :

DELIMITER $$

USE `test`$$ /*change database name*/

DROP PROCEDURE IF EXISTS `myList`$$

CREATE PROCEDURE `myList`(ilist VARCHAR(100))
BEGIN
    /*var*/
    /*DECLARE ilist VARCHAR(100) DEFAULT '2013-02-20,2013-06-30,2013-12-19';*/
    DECLARE delimeter VARCHAR(10) DEFAULT ',';
    DECLARE pos INT DEFAULT 0;
    DECLARE item VARCHAR(100) DEFAULT '';

    /*drop temporary table*/
    DROP TABLE IF EXISTS tmpList;

    /*loop*/
    loop_item: LOOP
       SET pos = pos + 1;

       /*split*/
       SET item =
       REPLACE(
           SUBSTRING(SUBSTRING_INDEX(ilist, delimeter, pos),
                LENGTH(SUBSTRING_INDEX(ilist, delimeter, pos -1)) + 1),
       delimeter, '');

       /*break*/
       IF item = '' THEN
           LEAVE loop_item;
       ELSE
           /*create temporary table*/
           CREATE TEMPORARY TABLE IF NOT EXISTS tmpList AS (
                    SELECT item AS sdate
               );
       END  IF;
    END LOOP loop_item;

    /*view*/
    SELECT * FROM tmpList;
END$$

DELIMITER ;

call procedure :

CALL myList('2013-02-20,2013-06-30,2013-12-19');

query :

SELECT
   *,
   (SELECT MAX(effective_date) FROM foo WHERE effective_date < sdate) AS effective_date
FROM tmpList

result :

------------------------------
|    sdate    |effective_date|
------------------------------
| 2013-02-20  |  2013-02-03  | 
| 2013-06-30  |  2013-05-11  |
| 2013-12-19  |  2013-12-08  |
0

The bad way first (without ordered analytical functions, or rank/row_number)

sel tmp.min_effective_date, for_id.id
from
 ( 
 Sel crossed.effective_date,max(SRC.effective_date) as min_effective_date
 from
 foo as src
 cross join
 foo as crossed
 where
 src.effective_date <cross.effective_date
 and crossed.effective_date in 
 (given dates here)
 group by 1
 ) tmp inner join foo as for_id on
 tmp.effective_date =for_id.effective_date

Next, with row_number

SEL TGT.id, TGT.effective_date
(Sel id, effective_date, row_number() over(order by effective_date asc) as ordered 
) SRC
INNER JOIN 
(Sel id, effective_date, row_number() over(order by effective_date asc) as ordered ) TGT
on
src.ordered+1=TGT.ordered
where src.effective_date in (given dates)

with ordered analytical functions:

sel f.id, tmp.eff 
foo as f inner join
(SEL ID, max(effective_date) over(order by effective_date asc  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as eff
from foo
) TMP
on f.id = tmp.id
where f.effective_date in (given dates)
and tmp.eff is not null

the queries above assume id needs to be selected, and the ids in the source don't follow the same sequence (eg ascending) as the dates. Otherwise, you can straight away use the ordered analytical function.