0

I am using psygopg2 to connect to a TimescaleDB instance. I want to query the latest entries from a big table (35 million rows) containing price information of assets with the columns datetime, asset_id, price, created_at:

create table prices
(
    datetime   timestamptz not null,
    asset_id   integer references assets (asset_id),
    price      real        not null,
    created_at timestamptz not null default now(),

    unique (datetime, asset_id)
);

This table was turned into a TimescaleDB hypertable with

select create_hypertable('"prices"', 'datetime');

and from this article's suggestion created the following index:

create index latest_prices_idx on prices (asset_id, datetime desc);

I use "Option 3" from this article to query the most recent price of each asset:

select distinct on (asset_id) datetime, asset_id, price
from prices
order by asset_id, datetime desc;

which takes ~300 ms via my IDE's console. However, when doing the same query via psycopg2 it takes about 10 seconds:

cursor.execute("""
    select distinct on (asset_id) datetime, asset_id, price
    from prices
    order by asset_id, datetime desc;
""")
res = cursor.fetchall()

Why might this be the case?

Prepending explain in front of the query outputs the exact same result for both the IDE and psycopg.

Edit: Answers to questions/suggestions in the comments:

  • The database is running in a local docker container.
  • The IDE is PyCharm, I do the queries from its database console and from a Python script within the IDE.
  • Using a server side cursor (like this) did not change anything.
  • PyCharm gives the query time in its console output. For getting the time of the Python code, I measure the execution time of the cursor.execute(...) call with time.perf_counter().
  • Using psql in the shell in the container has the bad performance as well (~10 s).
  • When dropping the index the performance flips: psql and psycopg take about 6 seconds, PyCharm's console about 13 seconds. The original behavior is reproducible when re-creating the index.
trivicious
  • 186
  • 1
  • 8
  • 1) Are you connecting locally from the IDE and remotely from the `psycopg2` code? 2) Have you tried a [server side cursor](https://www.psycopg.org/docs/usage.html#server-side-cursors)? – Adrian Klaver May 15 '23 at 18:28
  • @AdrianKlaver The db is running locally in a docker container and both connections are via PyCharm (either via it's database console feature or via a Python script). – trivicious May 15 '23 at 18:31
  • So where are you measuring the time? Best guess this is a PyCharm issue with displaying the results. Have you tried using `psql`? **Add this information and information in previous comment as update to question** so folks don't have to walk through the comments for relevant info. – Adrian Klaver May 15 '23 at 18:35
  • Thanks for you suggestions, I edited the post accordingly. – trivicious May 15 '23 at 19:03
  • 1
    1) Is `prices` a highly active table with a lot of inserts/updates? If so does a manual `ANALYZE` before running the table help? 2) What does the `EXPLAIN (ANALYZE, BUFFERS)` show for the query? – Adrian Klaver May 15 '23 at 19:44
  • Okay, I found the solution... Your intuition about something being wrong with PyCharm was correct. By default it only shows 500 results, however this limit is not shown even in the query log... I stumbled over this when trying to list how many rows there are in the query results (which suddenly took a long time). – trivicious May 15 '23 at 20:05
  • This is why I avoid GUI database programs. `psql` is not perfect but it runs closer to the database and rarely lies to you or editorializes the results. – Adrian Klaver May 15 '23 at 21:22

0 Answers0