I am using psygopg2 to connect to a TimescaleDB instance. I want to query the latest entries from a big table (35 million rows) containing price information of assets with the columns datetime
, asset_id
, price
, created_at
:
create table prices
(
datetime timestamptz not null,
asset_id integer references assets (asset_id),
price real not null,
created_at timestamptz not null default now(),
unique (datetime, asset_id)
);
This table was turned into a TimescaleDB hypertable with
select create_hypertable('"prices"', 'datetime');
and from this article's suggestion created the following index:
create index latest_prices_idx on prices (asset_id, datetime desc);
I use "Option 3" from this article to query the most recent price of each asset:
select distinct on (asset_id) datetime, asset_id, price
from prices
order by asset_id, datetime desc;
which takes ~300 ms via my IDE's console. However, when doing the same query via psycopg2 it takes about 10 seconds:
cursor.execute("""
select distinct on (asset_id) datetime, asset_id, price
from prices
order by asset_id, datetime desc;
""")
res = cursor.fetchall()
Why might this be the case?
Prepending explain
in front of the query outputs the exact same result for both the IDE and psycopg.
Edit: Answers to questions/suggestions in the comments:
- The database is running in a local docker container.
- The IDE is PyCharm, I do the queries from its database console and from a Python script within the IDE.
- Using a server side cursor (like this) did not change anything.
- PyCharm gives the query time in its console output. For getting the time of the Python code, I measure the execution time of the
cursor.execute(...)
call withtime.perf_counter()
. - Using psql in the shell in the container has the bad performance as well (~10 s).
- When dropping the index the performance flips: psql and psycopg take about 6 seconds, PyCharm's console about 13 seconds. The original behavior is reproducible when re-creating the index.