-1

I have a large SQL query (MySQL 5.7.32) that gets all sorts of data AND is including pricing data for a particular day, if there is price data available for this day. Done by a simple LEFT JOIN:

SELECT *
FROM merchants m CROSS JOIN products p
LEFT JOIN prices mps
    ON m.id = mps.id AND p.article_id = mps.article_id AND mps.DATE = $date

In some cases there is no price for this particular day. In that case I would like to get the last recent price available.

Is this possible with LEFT JOIN or do I have to add even more complexity to my statement?

merlin
  • 2,717
  • 3
  • 29
  • 59

1 Answers1

1

Yes it is possible

SELECT m.*, t.*
FROM merchants m
JOIN (
    SELECT MAX(`date`) as max_date, id
    FROM prices
    WHERE `date` <= ?
    GROUP BY id
) t ON t.id = m.id

Edit : 2 steps queries

Assuming your shop is displaying 20 products per pages. You can run a 1st query like this :

SELECT m.*
FROM merchants m
WHERE some_criterias
LIMIT 20 OFFSET 0

and then you pass the result of this 1st query to the 2nd one :

SELECT m.*, t.*
FROM merchants m
JOIN (
    SELECT MAX(`date`) as max_date, p.merchant_id
    FROM prices p
    AND merchant_id IN (?, ?, ?...)
    WHERE `date` <= ?
    GROUP BY id
) t ON t.merchant_id = m.id
Shaolin
  • 415
  • 5
  • 11
  • Interesting approach. I might need to mention that the price table has 300M rows. This query seems to be going through all of them with the <= date filter. It runns virtually for ever. Did I oversee something or does this rule out your solution? – merlin Dec 31 '20 at 22:40
  • here you can find a benchmark of my query : https://kristiannielsen.livejournal.com/6745.html as you can see it has the best performance. If your table has 300M rows you can split the process into 2 queries. A first query to get a list of products and a 2nd query (the one of my answer) which use this list of products. I am going to edit my answer. – Shaolin Jan 01 '21 at 07:15
  • How do you pass the the list of products to the second query? I can't use a middleware as this is a query to be used in Grafana. – merlin Jan 01 '21 at 12:06
  • The benchmark shows that 1M rows are retreived in 0.01 sec. With 300M it would be 3s. For analysis purpose it is fast enought. As far as we are, my answer fits your question, a +1 or tick would be really appreciate. If you need a specific answer about using it in Grafana, I would suggest you to create a new question. – Shaolin Jan 01 '21 at 14:19
  • I believe the problem is due to a cross join I am having in this query. Your solution looks similar to this one: https://stackoverflow.com/questions/583954/how-can-i-select-multiple-columns-from-a-subquery-in-sql-server-that-should-ha I tried the solution with two LEFT OUTER JOINs as well, but all queries run for > 60s. A subselect works in 60ms but returns only one value per definition. So I am somehow running out of ideas... – merlin Jan 01 '21 at 21:09
  • 1
    may be you can create an aggregate table that you rebuild every day or every hour. In this table you will put the result of ```SELECT MAX(`date`) as max_date, id FROM prices WHERE `date` <= ? GROUP BY id``` then the join will be a lot faster. – Shaolin Jan 01 '21 at 21:23
  • That sounds like a plan. Something else must be wrong, on a dev server with 40M rows that query alone takes >60s. Maybe an index is wrong? I added the DDL to illustrate the schema: https://www.db-fiddle.com/f/sPu2PSnY9ftSr4CkPDBc1w/2 On prod it executed in 1m. Maybe good enough... – merlin Jan 01 '21 at 22:16
  • you need an index on the date field, also you should use a real date like '2021-01-01' instead of curdate() for the index to work, more details here : https://stackoverflow.com/questions/55795485/why-mysql-hit-index-when-column-wrap-with-date-function if you find also my comment usefull don't forget to +1 on them :P – Shaolin Jan 01 '21 at 22:23
  • 1
    There is an index on DATE,SKU,MERCHANT_ID (unique). Do you believe an extra seperate index on date will help significantly? I implemented now a stored procedure that updates that table every night. Thank you for you help. Please also consider upvoting the question. – merlin Jan 01 '21 at 23:08
  • Sorry I didn't read the fiddle carefully. You are right, the unique index is enough. – Shaolin Jan 01 '21 at 23:21
  • I just realized that the results are wrong. It does get the latest date, but the coresponding values are from other rows of that set. It just gets (max date). Price e.g. is from another dataset with the grouped values. Are øu sure the max_date approach is right? – merlin Jan 02 '21 at 05:56
  • According to other SO posts MySQL does not support this sollution as it does not agregate the other comuns to the max column which makes the result useless. Please correct me if I am wrong. – merlin Jan 02 '21 at 11:23
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/226746/discussion-between-shaolin-and-merlin). – Shaolin Jan 02 '21 at 17:22