I'm having a problem with a postgresql table. The table has 14 columns, 2 of them that are indexed as following:
Column | Type | Modifiers
---------------------+------+-----------
u_id | text | not null
p_date | date | not null
...
Indexes:
"unq_uid_pdate" UNIQUE CONSTRAINT, btree (u_id, p_date)
"pdate_idx" btree (p_date)
The size of the table is around 5 billion rows, with a total of around 10 million distinct values in u_id, and each u_id has around 500 distinct values in p_date on average.
When I try to export by u_id, or without any filters, the performance is good reaching speeds of up to 400K rows per second without significant drops over time.
psql "..." -c "\\copy (SELECT u_id, p_date FROM l_nights) to stdout with csv header" | pv --line-mode --rate > /dev/null
[ 468k/s]
[ 485k/s]
[ 497k/s]
[ 435k/s]
However, if I want to export the approximately 10 million rows of a specific p_date, the performance is very poor, starting at few thousand rows per second, and reaching a stall in few minutes. Adding limit
the behaviour is the same:
psql "..." -c "\\copy (SELECT u_id, p_date FROM l_nights where p_date = '2020-06-01' limit 50000) to stdout with csv header" | pv --line-mode --rate > /dev/null
[1.09k/s]
[1.28k/s]
[1.48k/s]
[3.01k/s]
[ 954 /s]
[1.40k/s]
[ 918 /s]
[ 849 /s]
[ 383 /s]
[ 364 /s]
[ 358 /s]
[0.00 /s]
[ 170 /s]
[0.00 /s]
The query is using the proper index, based on the explain response:
explain analyse select u_id, p_date from l_nights where p_date = '2020-06-01' limit 50000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.72..158646.09 rows=50000 width=15) (actual time=9.672..207430.728 rows=50000 loops=1)
-> Index Scan using pdate_idx on l_nights (cost=0.72..233913736.58 rows=73722207 width=15) (actual time=9.671..207396.204 rows=50000 loops=1)
Index Cond: (p_date = '2020-06-01'::date)
Planning Time: 0.182 ms
Execution Time: 207451.360 ms
Based on the current performance, it seems to me like the search was doing a full scan, even though the column is indexed. Other export strategies like getting a postgresql cursor in node.js also behave the same.
What am I doing wrong?