0

I have looked for quite some time for an answer to this. It seems like it should be simple but I'm not sure it is ...

SETUP:

A database has two tables: my_contacts and my_sales.

Each contact can have one or more sales. Each sale has a creation date and can be open or closed.

CHALLENGE:

I want to generate a single query to return a joined results set which has only one record for each contact as follows:

  • If there are any open sales for the contact, return the sale that was most recently created.
  • If there are no open sales, return the sale that was most recently created.

I tried sorting by is_open, create_date_time and then grouping by contact_id - but Group By states that it chooses its records before any sorting takes place so it doesn't work.

DATABASE DETAILS:

CREATE TABLE `my_contacts` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`id`));

CREATE TABLE `my_sales` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `contact_id` INT(11) NOT NULL,
  `description` VARCHAR(45) NOT NULL,
  `is_open` INT(1) NOT NULL,
  `create_date_time` DATETIME NOT NULL,
  PRIMARY KEY (`id`));

INSERT INTO `my_contacts` (`name`) VALUES ('Jim');
INSERT INTO `my_contacts` (`name`) VALUES ('Jane');
INSERT INTO `my_contacts` (`name`) VALUES ('Roger');
INSERT INTO `my_contacts` (`name`) VALUES ('Alice');

INSERT INTO `my_sales` VALUES (NULL, '2', 'Books', '0', '2017-09-06');
INSERT INTO `my_sales` VALUES (NULL, '3', 'Toys', '0', '2017-06-21');
INSERT INTO `my_sales` VALUES (NULL, '2', 'Groceries', '1', '2017-05-06');
INSERT INTO `my_sales` VALUES (NULL, '1', 'Water', '0', '2016-09-21');
INSERT INTO `my_sales` VALUES (NULL, '4', 'Toys', '1', '2017-04-04');
INSERT INTO `my_sales` VALUES (NULL, '3', 'Food', '1', '2017-05-06');
INSERT INTO `my_sales` VALUES (NULL, '2', 'Water', '1', '2017-04-07');
INSERT INTO `my_sales` VALUES (NULL, '4', 'Food', '1', '2017-01-02');
INSERT INTO `my_sales` VALUES (NULL, '1', 'Food', '0', '2017-07-09');

RESULT:

The query would join the two tables and yield the following result:

id  contact_id  description is_open create_date_time    id  name
9   1           Food        0       7/9/2017 0:00       1   Jim
3   2           Groceries   1       5/6/2017 0:00       2   Jane
6   3           Food        1       5/6/2017 0:00       3   Roger
5   4           Toys        1       4/4/2017 0:00       4   Alice
  • Have you had a look at [this](https://stackoverflow.com/questions/5140785/mysql-order-before-group-by) and [this](https://stackoverflow.com/questions/14770671/mysql-order-by-before-group-by) question? They seem to ask the same thing you do, so maybe the answers there can help you solve your problem? – waka Sep 21 '17 at 07:49
  • Similar for sure. The first link is not good, but I may be able to figure it out from https://stackoverflow.com/questions/14770671/mysql-order-by-before-group-by -- I'll try now ... – Kevin Howard Goldberg Sep 21 '17 at 07:54
  • It's not working for me. I tried to keep it simple and just work on the inner query SELECT id, max(is_open), max(create_date_time), contact_id, description FROM my_sales GROUP BY contact_id and the results are wrong - data from different records is being combined into a single record and most importantly, the record ids are not correct. This is a non-trivial problem from what I can see. – Kevin Howard Goldberg Sep 21 '17 at 08:37
  • sort on child table before group by is useless. because you can't get it. you should use where condition to filter instead of sort – Hardeep Singh Sep 21 '17 at 12:44
  • Work on groupwise max for the first part. Then do a UNION for the second. – Rick James Sep 22 '17 at 01:33

1 Answers1

1

In database versions that support "window functions" this style of problem can be solved by using ROW_NUMBER() OVER() and the required ordering is applied within the OVER clause, as follows (using Mariadb 10.2 ):

select
*
from (
     select
      s.*, c.name
     , row_number() over(partition by c.id order by s.is_open DESC, create_date_time DESC) as rn
     from my_contacts c
     left join my_sales s on c.id = s.contact_id
     ) d
where rn = 1
order by d.contact_id

Note that you must filter via an alias of the row_number() hence you need a subquery construct as you see above.

Result

id | contact_id | description | is_open | create_date_time    | name  | rn
-: | ---------: | :---------- | ------: | :------------------ | :---- | -:
 9 |          1 | Food        |       0 | 2017-07-09 00:00:00 | Jim   |  1
 3 |          2 | Groceries   |       1 | 2017-05-06 00:00:00 | Jane  |  1
 6 |          3 | Food        |       1 | 2017-05-06 00:00:00 | Roger |  1
 5 |          4 | Toys        |       1 | 2017-04-04 00:00:00 | Alice |  1

dbfiddle here


For MySQL or Mariadb versions which do not support ROW_NUMBER() try the following approach should mimic the effect (and achieve the same result) of row_number() by using variables:

select
        d.contact_id
      , d.description
      , d.is_open
      , d.create_date_time
      , c.name
from (
        SELECT
                @row_num :=IF(@prev_value=s.contact_id,@row_num+1,1) AS RN
              , s.contact_id
              , s.description
              , s.is_open
              , s.create_date_time
              , @prev_value := s.contact_id
        FROM my_sales s
        CROSS JOIN (
                    SELECT @row_num :=1 x,  @prev_value :=0 y
                   ) vars
        ORDER BY
                s.contact_id
              , s.is_open DESC
              , s.create_date_time DESC
    ) d
inner join my_contacts c on c.id = d.contact_id
where d.rn = 1
order by d.contact_id

;

for this approach see: http://sqlfiddle.com/#!9/bfa9809/1

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51