1

I previously had this table:

CREATE TABLE traces_v0
 ( canvas_id UUID NOT NULL
 , tlid BIGINT NOT NULL
 , trace_id UUID NOT NULL
 , timestamp TIMESTAMP WITH TIME ZONE NOT NULL
 , PRIMARY KEY (canvas_id, tlid, trace_id)
 );

which I'm trying to change into this table:

CREATE TABLE traces_v0
 ( canvas_id UUID NOT NULL
 , root_tlid BIGINT NOT NULL
 , trace_id UUID NOT NULL
 , callgraph_tlids BIGINT[] NOT NULL
 , timestamp TIMESTAMP WITH TIME ZONE NOT NULL
 , PRIMARY KEY (canvas_id, root_tlid, trace_id)
 );

Which is to say, where previously there was one row per (tlid, trace_id), there is now a single row with a trace_id and an array of callgraph_tlids.

I have a query which worked well on the old table:

SELECT tlid, trace_id
  FROM (
    SELECT
      tlid, trace_id,
      ROW_NUMBER() OVER (PARTITION BY tlid ORDER BY timestamp DESC) as row_num
    FROM traces_v0
    WHERE tlid = ANY(@tlids::bigint[])
      AND canvas_id = @canvasID
  ) t
  WHERE row_num <= 10

This fetches the last 10 (tlid, trace_id) for each of tlids (a bigint array) ordered by timestamp. This is exactly what I need and was very effective.

(fyi: the "at" (@tlids) syntax is just a fancy way of writing $1, supported by my postgres driver)

I'm struggling to port this to the new table layout. I came up with the following which works except that it doesn't limit to 10 per tlid ordered by timestamp:

SELECT callgraph_tlids, trace_id
FROM traces_v0
WHERE @tlids && callgraph_tlids  -- '&&' is the array overlap operator
  AND canvas_id = @canvasID
ORDER BY timestamp DESC"

How can I do this query where I limit the results to 10 rows per tlid, ordered by timestamp?

I'm using Postgres 9.6 if that matters.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Paul Biggar
  • 27,579
  • 21
  • 99
  • 152

1 Answers1

0

How can I do this query where I limit the results to 10 rows per tlid, ordered by timestamp?

If timestamps for all rows in the old design that were aggregated into the same array in the new design have been the same all along, then this query for the new design is logically equivalent:

SELECT trace_id, tlid
FROM  (
   SELECT t.trace_id, c.tlid
        , row_number() OVER (PARTITION BY c.tlid ORDER BY t.timestamp DESC) AS rn
   FROM   traces_v0 t
   JOIN   LATERAL unnest(t.callgraph_tlids) c(tlid) ON c.tlid = ANY(@tlids)
   WHERE  t.canvas_id = @canvasid
   AND    t.callgraph_tlids && @tlids
   ) sub
WHERE  rn <= 10;

But that means ORDER BY timestamp DESC has been a non-deterministic sort order all along and your new query is just as unreliable as the old one. The top 10 may change from one invocation of the query to the next. If you want a deterministic results, add more expressions as tiebreaker(s) to the ORDER BY list until the sort order is unambiguous - probably in any case.

The WHERE condition t.callgraph_tlids && @tlids on top of the join condition ON c.tlid = ANY(@tlids) is logically redundant, but typically helps to make your query much faster, especially with a GIN index on callgraph_tlids. See:

About the LATERAL join:

Even works in your outdated, unsupported Postgres 9.6. But upgrade to a current version in any case.

If timestamps of aggregated rows were not the same, then the answer is: You cannot.

The new design removes required information. The old design has a separate timestamp for each tlid, while the new design only has a single timestamp for a whole array (callgraph_tlids).

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228