3

Here's my table schema:

CREATE TABLE tickers (
    product_id TEXT NOT NULL,
    trade_id INT NOT NULL,
    sequence BIGINT NOT NULL,
    time TIMESTAMPTZ NOT NULL,
    price NUMERIC NOT NULL,
    side TEXT NOT NULL,
    last_size NUMERIC NOT NULL,
    best_bid NUMERIC NOT NULL,
    best_ask NUMERIC NOT NULL,
    PRIMARY KEY (product_id, trade_id)
);

CREATE INDEX idx_tickers_product_id_time ON tickers (product_id, time);

My application subscribes to Coinbase Pro's websocket on the "ticker" channel and inserts a row into the tickers table whenever it receives a message.

The table has over two million rows now.

I learned how to use index skip scan emulation (see: SELECT DISTINCT is slower than expected on my table in PostgreSQL) in PostgreSQL in order to quickly retrieve distinct product_id values from this table, rather than using the slower SELECT DISTINCT method.

I also want to retrieve min/max values for other columns. Here's what I came up with. It takes ~2.9 milliseconds over 2.25 rows.

Is there a better way to accomplish this?

WITH product_ids AS (
    WITH RECURSIVE cte AS (
       (   -- parentheses required
           SELECT product_id
           FROM tickers
           ORDER BY 1
           LIMIT 1
       )
       UNION ALL
       SELECT l.*
       FROM cte c
       CROSS JOIN LATERAL (
          SELECT product_id
          FROM tickers t
          WHERE t.product_id > c.product_id  -- lateral reference
          ORDER BY 1
          LIMIT 1
          ) l
       )
    TABLE cte
)
SELECT
    product_id,
    (SELECT (MAX(trade_id) - MIN(trade_id) + 1) FROM tickers WHERE product_id = product_ids.product_id) AS ticker_count,
    (SELECT MIN(time) FROM tickers WHERE product_id = product_ids.product_id) AS min_time,
    (SELECT MAX(time) FROM tickers WHERE product_id = product_ids.product_id) AS max_time
FROM product_ids
ORDER BY ticker_count DESC
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Richard Gieg
  • 1,276
  • 8
  • 17
  • Are there any other indexes? – Erwin Brandstetter Mar 31 '21 at 21:31
  • There is a combined index on product_id and time. Updated schema in question. – Richard Gieg Mar 31 '21 at 21:33
  • Perfect! We can put it to good use. :) – Erwin Brandstetter Mar 31 '21 at 21:35
  • `(MAX(trade_id) - MIN(trade_id)`? Subtracting IDs? That's not a typo? – Erwin Brandstetter Mar 31 '21 at 21:37
  • That's correct. The trade_id for each product always goes up by 1 as new tickers are broadcasted by Coinbase Pro's websocket "ticker" channel. I previously used `COUNT` in previous iterations of this query, but I came up with this as an optimization because I found that `COUNT` was very slow once my table started filling up with millions of rows. But if we can avoid doing this subtraction I think it would be better. – Richard Gieg Mar 31 '21 at 21:40
  • @ErwinBrandstetter: Oops... it was supposed to be (MAX(trade_id) - MIN(trade_id) + 1), not (MAX(trade_id) - MIN(trade_id)). I edited my question. – Richard Gieg Mar 31 '21 at 21:51
  • `time` is the only column that can be NULL? Typo? – Erwin Brandstetter Mar 31 '21 at 21:55
  • It should be `NOT NULL`. I'll fix it. – Richard Gieg Mar 31 '21 at 21:58
  • Writing this as a comment because I get super annoyed when people say "why don't you redesign your schema?". But have you considered having a seperate version table? One that's a simple mapping between the product_id and the trade_id? And you can join that onto the main table. – Rol Apr 07 '22 at 13:31

1 Answers1

3

Query

Using the existing index on (product_id, time) we can get two for the price of one, i.e. fetch product_id and minimum time in one index scan:

WITH RECURSIVE product_ids AS (
   (   -- parentheses required
   SELECT product_id, time AS min_time
   FROM   tickers
   ORDER  BY 1, 2
   LIMIT  1
   )
   UNION ALL
   SELECT l.*
   FROM   product_ids p
   CROSS JOIN LATERAL (
      SELECT t.product_id, t.time
      FROM   tickers t
      WHERE  t.product_id > p.product_id
      ORDER  BY 1, 2
      LIMIT  1
      ) l
   )
SELECT product_id, min_time
    , (SELECT MAX(time) FROM tickers WHERE product_id = p.product_id) AS max_time
    , (SELECT MAX(trade_id) - MIN(trade_id) + 1 FROM tickers WHERE product_id = p.product_id) AS ticker_count
FROM   product_ids p
ORDER  BY ticker_count DESC;

Also, no need for a second CTE wrapper.

Indexes

Currently you have two indexes: The PK index on (product_id, trade_id), and another one on (product_id, time). You might optimize this by reversing the column order in one of both. Like:

PRIMARY KEY (trade_id, product_id)

Logically equivalent, but typically more efficient as it covers a wider range of possible queries. See (again):

We only need the existing index on (product_id, time), so no direct effect on this query.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228