I have a large table with millions of rows like this:
CREATE TABLE mytable (
row_id bigint,
col_id bigint,
value double precision,
timestamp timestamp
);
Given:
list_row
= a list ofrow_id
s (can be ordered if needed)list_col
= a list ofcol_id
s (again, can be ordered if needed)- Both lists may be very large (maybe 10s of thousands)
- The table above may have many millions of entries
How do I (efficiently) return a resource where:
- Columns are all the
col_id
s present inlist_col
and occur in the same order that thecol_id
s occur inlist_col
- Rows are all the
row_id
s present inlist_row
(they need not occur in the same order) - Each field contains the
value
s of the givenrow_id
andcol_id
s. - We are only interested in the most recently recorded
value
s for anyrow_id:col_id
pair i.e. useMAX(timestamp
) or something similar as a filter - In the result, if there is no recorded
value
for a givenrow_id:col_id
co-ordinate then that field should benull
.
A visual example to clarify. The initial table:
+--------+--------+-------+-----------+
| row_id | col_id | value | timestamp |
+========+========+=======+===========+
| 10 | 20 | 100 | 2016-0... |
| 10 | 21 | 200 | 2015-0... |
| 11 | 20 | 300 | 2016-1... |
| 11 | 22 | 400 | 2016-0... |
+--------+--------+-------+-----------+
becomes:
col_id →
+-----------------+
| 20 | 21 | 22 |
+=====+=====+=====+
row_id (10) | 100 | 200 | |
↓ (11) | 300 | | 400 |
+-----+-----+-----+
I suspect that the correct answer is to start by creating a temporary table with the target col_id
s as columns and then do some sort of join. I cannot work out how to do this efficiently. Is it possible to do this without needing a temporary table for each row_id
?