0

We have a database with over 7 million entries, and we are struggling to find the best SQL query to obtain the latest entries for each a & b & c combination on each version. For instance:

create table my_table
(
  id                int auto_increment    primary key,
  a                 int                                  not null,
  b                 varchar(64)                          not null,
  c                 varchar(128)                         not null,
  version           varchar(64)                          not null,
  timestamp         datetime                             not null
);

With the following indexes:

/* Composite Index on a,b,c columns */
create index ix_my_table_a_b_c on my_table (a, b, c);
/* Index on version */
create index ix_my_table_version on my_table (version);

If we have the following data:

+----+---+-------+-------+---------+---------------------+
| id | a |   b   |   c   | version |      timestamp      |
+----+---+-------+-------+---------+---------------------+
|  1 | 1 | A84BS | BBK3S |    A    | 2020-01-01 00:00:00 |
|  2 | 1 | A84BS | BBK3S |    A    | 2020-01-01 01:00:00 |
|  3 | 1 | A84BS | BBK3S |    A    | 2020-01-01 02:00:00 |
|  4 | 2 | H8SSS | KKL90 |    A    | 2020-01-01 01:00:00 |
|  5 | 2 | H8SSS | KKL90 |    A    | 2020-01-01 00:00:00 |
|  6 | 3 | JKSIE | OOPW2 |    A    | 2020-01-01 02:00:00 |
|  7 | 4 | KL093 | IURYD |    B    | 2020-01-01 02:00:00 |
+----+---+-------+-------+---------+---------------------+

The returned rows for version A would be all the unique a,b,c combinations with version=A and the higher timestamp:

+----+---+-------+-------+---------+---------------------+
| id | a |   b   |   c   | version |      timestamp      |
+----+---+-------+-------+---------+---------------------+
|  3 | 1 | A84BS | BBK3S |    A    | 2020-01-01 02:00:00 |
|  4 | 2 | H8SSS | KKL90 |    A    | 2020-01-01 01:00:00 |
|  6 | 3 | JKSIE | OOPW2 |    A    | 2020-01-01 02:00:00 |
+----+---+-------+-------+---------+---------------------+

I have tried the following query to obtain these output, but doesn't seems to be so fast:

SELECT first_value(id) OVER(PARTITION BY a, b, c ORDER BY timestamp desc) latest_rows FROM my_table WHERE version='A';

Which would be the best query to obtain this result? Should I change the indexes or the structure of the table to obtain a significant improvement?

Ander
  • 5,093
  • 7
  • 41
  • 70
  • @Strawberry.. are you sure this is not a relevant question? The related answer doesn't really explain how to build a performant query for a composite set of columns https://stackoverflow.com/users/1529673/strawberry – Ander Apr 01 '20 at 08:05
  • Note that a query can only use one index (per table instance?) so version should be part of your composite. So, just play with the order of the index (version,a,b,c), (version,a,c,b), (b,a,c,version), etc. and see what works fastest. One way or another, this is the single most frequently asked question under this tag, but it you're still really struggling, I'll vote to reopen – Strawberry Apr 01 '20 at 08:43

0 Answers0