1

I want to get latest records of the table with some settings_ids.

id settings_id added_date
1 7 2022-08-23 01:44:24
2 9 2022-08-23 01:44:24
3 11 2022-08-23 01:44:24
4 7 2022-08-25 01:44:24
5 9 2022-08-25 01:44:24
6 11 2022-08-25 01:44:24
7 7 2022-08-26 01:44:24
8 9 2022-08-26 01:44:24
9 11 2022-08-26 01:44:24
SELECT id, settings_id, MAX(added_date) 
FROM data_rows 
WHERE settings_id IN (7,9,11) 
GROUP BY settings_id;

Expected Result

id settings_id added_date
7 7 2022-08-26 01:44:24
8 9 2022-08-26 01:44:24
9 11 2022-08-26 01:44:24

I am getting the result I want but the thing is it taking more than a minute to get the data.

Is there a way to reduce the time taken by this query?

Thanks

Rick James
  • 135,179
  • 13
  • 127
  • 222
rkrathor
  • 77
  • 7
  • 1
    How many total records are there and do you have the `settings_id` and `addedDate` column indexed? – Altimus Prime Dec 27 '22 at 16:54
  • It's not an answer to the question, but you need an aggregation function around `id`, since it's not functionally dependent on `settings_id`. See https://stackoverflow.com/questions/34115174/error-related-to-only-full-group-by-when-executing-a-query-in-mysql – Barmar Dec 27 '22 at 16:55
  • Have you tried creating an index? Such as on `(settings_id, addedDate)`? (Also please don't mix casing, it should be `added_date`) – MatBailie Dec 27 '22 at 16:58
  • Translation - Do you have a PRIMARY KEY ? (suggested to set it for id) How many records are we talking. If its minutes .. it better be in the hundreds of thousands if not millions. Also depends on the stats on your MYSQL server. Your SELECT is very specific so that is good, no use of * etc when you don't need everything is good practice. – easleyfixed Dec 27 '22 at 17:02
  • @AltimusPrime records are thousands and yes indexing is there. – rkrathor Dec 27 '22 at 17:19
  • 1
    @easleyfixed, yes Primary key is there and yes records are in thousands and increasing every day. Thanks – rkrathor Dec 27 '22 at 17:22
  • @easleyfixed What does the primary key have to do with this? It's probably `id`, but it's irrelevant to this query. `settings_id` can't be a primary key, since it has duplicates. – Barmar Dec 27 '22 at 17:26
  • "Indexing is there" is vague. There should be a separate index for each. Thousands of records should actually process fast even without the indexing because it's a trivial amount of data. Are there thousands of columns also? Maybe the problem is in the server this is running on. Is this a micro server or something? – Altimus Prime Dec 27 '22 at 17:27
  • For context, I just ran your same query on a table of mine with with several thousand rows and it ran in just .047 seconds, so there's no optimization necessary on the query itself. – Altimus Prime Dec 27 '22 at 17:36
  • Ahh I didn't see the duplicates, you got a good point. Good deal, glad you got a primary key, so its prob like Altimus said, Indexing could help quite a bit. – easleyfixed Dec 27 '22 at 17:38
  • 1
    @AltimusPrime, thanks for your time. There is only 10 columns, indexing was there and yes data should come fast and this part (WHERE settings_id IN (7,9,11) GROUP BY settings_id;) was taking the time. Now when I added indexing again it giving the result in milliseconds. I will still check why it was not working with old indexing. – rkrathor Dec 27 '22 at 17:44
  • Thanks everyone for your quick response and time. – rkrathor Dec 27 '22 at 17:47
  • I ran the same query on a million rows (290Mb) with no index on an Azure Standard B1s and it took 6.7 seconds. – Altimus Prime Dec 27 '22 at 17:55
  • "indexing is there" is too imprecise -- Please provide `SHOW CREATE TABLE`. – Rick James Dec 29 '22 at 20:50

2 Answers2

4

On MySQL 8+, your requirement is easily met using ROW_NUMBER:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY settings_id ORDER BY addedDate DESC) rn
    FROM data_rows 
    WHERE settings_id IN (7, 9, 11)
)

SELECT id, settings_id, addedDate
FROM cte
WHERE rn = 1
ORDER BY settings_id;

As for optimizing the above query, an index on (settings_id, dateAdded DESC) should help:

CREATE INDEX idx ON data_rows (settings_id, dateAdded);

This index, if used, should let MySQL rapidly compute the required row number.

Edit:

On MySQL 5.7, use this query:

SELECT d1.id, d1.settings_id, d1.addedDate
FROM data_rows d1
INNER JOIN
(
    SELECT settings_id, MAX(addedDate) AS maxAddedDate
    FROM data_rows
    WHERE settings_id IN (7, 9, 11)
    GROUP BY settings_id
) d2
    ON d2.settings_id = d1.settings_id AND
       d2.maxAddedDate = d1.addedDate
WHERE
    d1.settings_id IN (7, 9, 11)
ORDER BY
    d1.settings_id;

Use the same index as suggested above.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • The question is about how to fix the performance problem, not how to write the query correctly. For that there are a number of duplicates you can refer to. The one I use is https://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group-mysql?rq=1 – Barmar Dec 27 '22 at 17:00
  • @Barmar -- I think Tim correctly identified the solution here -- are you suggesting this question should be closed as a duplicate? Shouldn't you comment on the question then? – Hogan Dec 27 '22 at 17:23
  • @Tim Biegeleisen, Mysql server version is 5.7 and can't change right now. – rkrathor Dec 27 '22 at 17:25
  • 1
    @Hogan When Barmar dropped that comment I was still in the process of typing out the index portion of my answer. IMO correcting the query first is neccessary before we tune it. – Tim Biegeleisen Dec 27 '22 at 17:25
  • 1
    @Hogan There's probably lots of questions about optimizing queries using indexes, but I don't have a duplicate handy. – Barmar Dec 27 '22 at 17:27
  • Ok, I understand now -- this is a combo question of often asked max value in joined table and index creation. – Hogan Dec 27 '22 at 17:29
  • @rkrathor Check my updated answer for a MySQL 5.7 friendly query. – Tim Biegeleisen Dec 27 '22 at 17:30
  • @TimBiegeleisen, I remove old indexing and added index as you suggested and then run your 5.7 friendly query and got the data in .033 sec and I run my own query which took .016. I guess its solve as of now. Thanks – rkrathor Dec 27 '22 at 17:38
  • Thanks everyone for your quick response and time. – rkrathor Dec 27 '22 at 17:47
  • @rkathor You are aware that your query and these ones don't do the exact same thing? The queries here ***always*** give the latest id per group, but your query implicitly does `ANY_VALUE(id)`, and so does ***not*** guarantee the latest value. It's explicitly mentioned in the manual. https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html – MatBailie Dec 27 '22 at 17:58
1

Be aware that you won't necessarily get the id that matches the MAX. Cf "only_full_group_by".

For this (no id):

SELECT settings_id, MAX(added_date) 
FROM data_rows 
WHERE settings_id IN (7,9,11) 
GROUP BY settings_id;

Simply have INDEX(settings_id, added_date)

For also getting the matching id, see the groupwise max tag. Or ids -- there could be dup dates?

If there are no dup dates (and you want id), then

SELECT MAX(id), settings_id, MAX(added_date) 
FROM data_rows 
WHERE settings_id IN (7,9,11) 
GROUP BY settings_id;

At that point, do you really need id? Consider getting rid of id and changing to

PRIMARY KEY(setting_id, added_date)

(and going back to my first SELECT suggestion)

Do note that a "composite" (multi-column) index is not the same as separate indexes on each column.

Rick James
  • 135,179
  • 13
  • 127
  • 222