We have a database with over 7 million entries, and we are struggling to find the best SQL query to obtain the latest entries for each a & b & c combination on each version. For instance:
create table my_table
(
id int auto_increment primary key,
a int not null,
b varchar(64) not null,
c varchar(128) not null,
version varchar(64) not null,
timestamp datetime not null
);
With the following indexes:
/* Composite Index on a,b,c columns */
create index ix_my_table_a_b_c on my_table (a, b, c);
/* Index on version */
create index ix_my_table_version on my_table (version);
If we have the following data:
+----+---+-------+-------+---------+---------------------+
| id | a | b | c | version | timestamp |
+----+---+-------+-------+---------+---------------------+
| 1 | 1 | A84BS | BBK3S | A | 2020-01-01 00:00:00 |
| 2 | 1 | A84BS | BBK3S | A | 2020-01-01 01:00:00 |
| 3 | 1 | A84BS | BBK3S | A | 2020-01-01 02:00:00 |
| 4 | 2 | H8SSS | KKL90 | A | 2020-01-01 01:00:00 |
| 5 | 2 | H8SSS | KKL90 | A | 2020-01-01 00:00:00 |
| 6 | 3 | JKSIE | OOPW2 | A | 2020-01-01 02:00:00 |
| 7 | 4 | KL093 | IURYD | B | 2020-01-01 02:00:00 |
+----+---+-------+-------+---------+---------------------+
The returned rows for version A would be all the unique a,b,c combinations with version=A
and the higher timestamp:
+----+---+-------+-------+---------+---------------------+
| id | a | b | c | version | timestamp |
+----+---+-------+-------+---------+---------------------+
| 3 | 1 | A84BS | BBK3S | A | 2020-01-01 02:00:00 |
| 4 | 2 | H8SSS | KKL90 | A | 2020-01-01 01:00:00 |
| 6 | 3 | JKSIE | OOPW2 | A | 2020-01-01 02:00:00 |
+----+---+-------+-------+---------+---------------------+
I have tried the following query to obtain these output, but doesn't seems to be so fast:
SELECT first_value(id) OVER(PARTITION BY a, b, c ORDER BY timestamp desc) latest_rows FROM my_table WHERE version='A';
Which would be the best query to obtain this result? Should I change the indexes or the structure of the table to obtain a significant improvement?