The fastest query technique depends on the distribution of values. DISTINCT ON
would be a simple solution in Postgres, ideal for just a few values per id
in each child table. But guessing from your description I expect many rows per id
, so I suggest a solution with LATERAL
joins. Requires Postgres 9.3+:
One more complication for your already-not-so-simple case:
Also the values can be of different types
Alternative 1
Cast all values to text
. Every data type can be cast to text
.
Base query
SELECT m.id, v.timestamp, 1 AS tbl, v.value -- simple int as table id
FROM master m
, LATERAL (
SELECT timestamp, value::text -- cast to text
FROM vtbl1
WHERE id = m.id -- lateral reference
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 2 AS tbl, v.value -- ascending without gaps
FROM master m
, LATERAL (
SELECT timestamp, value::text
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp, 3 AS tbl, value
FROM ...
;
All you need for this to be fast is an index on (id, timestamp)
for each child table. Best in this form (adding value
is only useful if you get index-only scans out of it):
CREATE INDEX vtbl1_combo_idx ON vtbl1 (id, timestamp DESC NULLS LAST, value)
1a. Aggregate (pseudo-crosstab)
To format as desired use aggregate functions on CASE
expressions in Postgres 9.3 or older (like demonstrated by @sgeddes) or (better) the new aggregate FILTER
clause in Postgres 9.4+:
SELECT id, timestamp
, max(value) FILTER (WHERE tbl = 1) AS val1
, max(value) FILTER (WHERE tbl = 2) AS val2
, ...
FROM ( <query frm above> ) t
GROUP BY 1, 2;
1b. Crosstab
Actual cross tabulation (also called "pivot" in other RDBMS) should be considerably faster. You need the additional module tablefunc
installed, instructions below.
The special difficulty here: we have a composite "row name" (id, timestamp)
, but the function expects a single column as row name. So we substitute with row_number()
, but do not display that surrogate key in the result:
SELECT id, timestamp, val1, val2, val3, ...
-- normally SELECT * is enough; explicit list to filter rn
FROM crosstab(
$$
SELECT row_number() OVER (ORDER BY id, timestamp DESC NULLS LAST) AS rn
, id, timestamp, tbl, value
FROM ( <query from above> ) t
ORDER BY 1
$$
, 'SELECT generate_series(1,3)' -- replace 3 with highest table nr.
) AS ct (
rn int, id int, timestamp date
, val1 text, val2 text, val3 text, ...);
Closely related:
Relevant basics:
Alternative 2
Simple, but may be just as fast and preserves original data types:
SELECT id, timestamp
, max(val1) AS val1, max(val2) AS val2, max(val3) AS val3, ...
FROM (
SELECT m.id, v.timestamp
, v.value AS val1, NULL::int AS val2, NULL::numeric AS val3, ...
-- list all values with actual data type
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl1
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, v.value, NULL, ... -- column names & data types defined in first SELECT
FROM master m
, LATERAL (
SELECT timestamp, value
FROM vtbl2
WHERE id = m.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) v
UNION ALL
SELECT m.id, v.timestamp
, NULL, NULL, v.value, ...
FROM ...
) t
GROUP BY 1, 2
ORDER BY 1, 2;
Aside: Never use basic type names or reserved words (in standard SQL) like timestamp
as identifier.