I have two tables, store_customers
and additional_customers
.
Each
store_id
in thestore_customers
table must be uniqueEach
store_id, customer_id
in theadditional_customers
table must be unique.Each
store_id
in theadditional_customers
table must exist in thestore_customers
tableCREATE TABLE store_customers ( autoinc int(10) unsigned NOT NULL AUTO_INCREMENT, store_id varchar(50) NOT NULL, customer_id varchar(50) NOT NULL, datetime datetime NOT NULL, field_1 varchar(50) DEFAULT NULL, field_2 varchar(50) DEFAULT NULL, PRIMARY KEY ('autoinc'), UNIQUE KEY 'store_id' ('store_id'), KEY 'customer_number' ('customer_number') ); INSERT INTO store_customers (store_id, customer_id, datetime, field_1, field_2) VALUES ('100', '10', '2011-01-01', 'aaa', 'bbb'), ('200', '20', '2012-01-01', 'ccc', 'ddd'), ('300', '20', '2013-01-01', 'eee', 'fff'), ('400', '40', '2014-01-01', 'ggg', 'hhh'), ('500', '50', '2015-01-01', 'iii', 'jjj'), ('600', '50', '2016-01-01', 'kkk', 'lll'), ('700', '70', '2017-01-01', 'mmm', 'nnn'), ('800', '70', '2018-01-01', 'ooo', 'ppp'), ('900', '90', '2019-01-01', 'qqq', 'rrr'); CREATE TABLE additional_customers ( store_id varchar(50) NOT NULL customer_id varchar(50) NOT NULL ); INSERT INTO store_customers (store_id, customer_id) VALUES ('400', '41'), ('400', '42'), ('500', '51'), ('500', '52'), ('700', '71'), ('700', '72'), ('800', '81'), ('800', '82'), ('900', '70');
To make things easier to read:
SELECT * FROM store_customers;
store_id customer_id datetime field_1 field_2
100 10 2011-01-01 aaa bbb
200 20 2012-01-01 ccc ddd
300 20 2013-01-01 eee fff
400 40 2014-01-01 ggg hhh
500 50 2015-01-01 iii jjj
600 50 2016-01-01 kkk lll
700 70 2017-01-01 mmm nnn
800 70 2018-01-01 ooo ppp
900 90 2019-01-01 qqq rrr
-------------------------------------------------------
SELECT * FROM additional_customers;
store_id customer_id
400 41
400 42
500 51
500 52
700 71
700 72
800 81
800 82
900 70
For each distinct customer_id
, I need to find the matching record in store_customers
with the earliest datetime.
For example, customer_id = 70 appears in store_customers
twice, tied to two different store_ids (700, 800) - and in additional_customers
once, with a different store_id (900) than the records in the prior table. For that customer_id, the earliest datetime is store_id = 700, so I get one row for that customer_id with matching data from store_customers WHERE store_id = 700
.
My end result would be:
store_id customer_id datetime field_1 field_2
100 10 2011-01-01 aaa bbb
200 20 2012-01-01 ccc ddd
400 40 2014-01-01 ggg hhh
400 41 2014-01-01 ggg hhh
400 42 2014-01-01 ggg hhh
500 50 2015-01-01 iii jjj
500 51 2015-01-01 iii jjj
500 52 2015-01-01 iii jjj
700 70 2017-01-01 mmm nnn
700 71 2017-01-01 mmm nnn
700 72 2017-01-01 mmm nnn
800 81 2018-01-01 ooo ppp
800 82 2018-01-01 ooo ppp
900 90 2019-01-01 qqq rrr
My current query "works", but I think it may be picking rows arbitrarily(?) - which I don't want. Also, EXPLAIN tells me this is terribly inefficient:
SELECT sc.store_id,
customers.customer_id,
MIN(sc.datetime),
sc.field_1,
sc.field_2
FROM (
SELECT store_id, customer_id
FROM store_customers
UNION
SELECT store_id, customer_id
FROM additional_customers
) AS customers
JOIN store_customers sc
ON sc.store_id = customers.store_id
OR sc.customer_id = customers.customer_id
GROUP BY customer_id;
How can I make a more accurate, efficient query to return my expected result?
Edit: I should add - I can add additional indexes if necessary. Also, I'm using MySQL version 5.7 - so I can't use window functions.