EDIT In my original question I noticed a difference between searching for neighbors using a JOIN
and using a WHERE .. IN
clause, which @LukaszSzozda rightfully pointed out is a semi-join. It turns out my node list had duplicates, which explains why the JOIN
took longer to run. Thanks, @LukaszSzozda. The more important aspect of my question remains, though, which is what is brought below. UPDATE I added the relevant configuration options to the bottom, and updated statistics using ANALYZE
(thanks to @joop). Also, I tested with three different indices (B-Tree, hash, BRIN). Finally, I noticed that using different queries returned different number of rows into tmp_nodes
, possibly because of different ordering, so I fixed it to a constant set of rather-random 8,000 nodes .
In PostgreSQL, my query to search for neighbors of 8,000 nodes among ~200*106 nodes (within ~1.3*109 edges) is slow (~30 seconds using hash index; see index benchmarking below).
Given the setup I describe below, are there further optimizations for my server software, database, table or query to make the neighbor search faster? I am particularly surprised at this speed considering how well PostgreSQL did on the ArangoDB NoSQL benchmark.
More specifically:
- I am aware of AgnesGraph, but do not wish yet to move to a graph-database solution, specifically since I cannot tell from AgnesGraph how well it keeps up-to-date with PostgreSQL. Can someone explain the performance benefits with regard to how the query actually happens in AgnesGraph vs PostgreSQL, so that I can decide whether to migrate?
- Are there any configuration tweaks, whether in the server or the OS, that affect my query according to the plan, which cause it to run for longer than needed?
Set up
I have a large graph database ( ~109 edges, ~200*106 nodes) in PostgreSQL (PostgreSQL 10.1, which I had to pull from the zesty
PPA) stored on the cloud (DigitalOcean, 6-core, 16GB RAM machine, Ubuntu 17.10, Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz), and set up with parameters suggested by PGTune (see bottom). I am querying on-server.
I have created forward- and backward-edges tables (see this question)
CREATE TABLE edges_fwd (src BIGINT, dest BIGINT, PRIMARY KEY (src, dest));
CREATE TABLE edges_back (src BIGINT, dest BIGINT, PRIMARY KEY (dest, src));
and clustered both by the respective keys (just in case):
CLUSTER edges_fwd USING edges_fwd_pkey;
CLUSTER edges_back USING edges_back_pkey;
I turned off enabled_seqscan
for the purpose of testing my queries (see side note below).
I would like to load all the out-edges for 8,000 nodes (which 8,000 nodes these are can change depending on user query), whose identifiers are list in a table tmp_nodes
(with a single column, nid
). I initially wrote this version on the query (patting myself in the back on already following the lines of the graph talk from PGCon11):
SELECT e.*
FROM tmp_nodes
JOIN edges AS e
ON e.src = tmp_nodes.nid;
I also tried:
SELECT * FROM edges_fwd AS e
WHERE e.src IN (SELECT nid FROM tmp_nodes);
They both are slow, and take about 30 seconds to run at best (using hash indicies). EXPLAIN ANALYZE
outputs are brought below.
I expected things to generally run much faster. For looking for 8,000 keys in a clustered table (yes, I know it's not really a clustered index), since the server knows that the rows are ordered, I should expect less page reads than the number of total rows returned. So while 243,708 rows are fetched, which isn't a little, they are associated with 8,000 distinct keys, and the number of reads should not be much larger than that: it's an average of 30 rows per key, which is about 1,400 bytes per read (the table size is 56GB and has 1.3B rows, so it's about 46 bytes per row; which by the way is quite a bloat for 16 bytes of data). This is far below the page size (4K) for the system. I didn't think reading 8,000 pages, even random-access, should take this long.
This brings me back to my questions (above).
Forcing index usage
I took advice from answers to another question and at least for testing (though, since my database is read-only, I might be tempted to use it in production), set enable_seqscan
to off
, in order to force index usage.
I ran each 5 times - the times varied by a few seconds here and there.
EXPLAIN ANALYZE
outputs
Taking care to flush OS disk cached and restart the server to reflect correct random-seek timings, I used EXPLAIN ANALYZE
on both queries. I used two types of indexes - B-Tree and hash. I also tried BRIN with different values for the pages_per_range
option (2, 8, 32 and 128), but they are all slower (in orders or magnitude) than those mentioned above. I am giving the results below for reference.
B-Tree index, JOIN
query:
Nested Loop (cost=10000000000.58..10025160709.50 rows=15783833 width=16) (actual time=4.546..39152.408 rows=243708 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.712..15.721 rows=8000 loops=1)
-> Index Only Scan using edges_fwd_pkey on edges_fwd e (cost=0.58..3125.34 rows=1973 width=16) (actual time=4.565..4.879 rows=30 loops=8000)
Index Cond: (src = tmp_nodes.nid)
Heap Fetches: 243708
Planning time: 20.962 ms
Execution time: 39175.454 ms
B-Tree index, WHERE .. IN
query (semi-join):
Nested Loop (cost=10000000136.58..10025160809.50 rows=15783833 width=16) (actual time=9.578..42605.783 rows=243708 loops=1)
-> HashAggregate (cost=10000000136.00..10000000216.00 rows=8000 width=8) (actual time=5.903..35.750 rows=8000 loops=1)
Group Key: tmp_nodes.nid
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.722..2.695 rows=8000 loops=1
)
-> Index Only Scan using edges_fwd_pkey on edged_fwd e (cost=0.58..3125.34 rows=1973 width=16) (actual time=4.924..5.309 rows=30 loops=8000)
Index Cond: (src = tmp_nodes.nid)
Heap Fetches: 243708
Planning time: 19.126 ms
Execution time: 42629.084 ms
Hash index, JOIN
query:
Nested Loop (cost=10000000051.08..10056052287.01 rows=15783833 width=16) (actual time=3.710..34131.371 rows=243708 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.960..13.338 rows=8000 loops=1)
-> Bitmap Heap Scan on edges_fwd e (cost=51.08..6986.79 rows=1973 width=16) (actual time=4.086..4.250 rows=30 loops=8000)
Heap Blocks: exact=8094
-> Bitmap Index Scan on ix_edges_fwd_src_hash (cost=0.00..50.58 rows=1973 width=0) (actual time=2.563..2.563 rows=31
loops=8000) Execution time: 34155.511 ms
Hash index, WHERE .. IN
query (semi-join):
Nested Loop (cost=10000000187.08..10056052387.01 rows=15783833 width=16) (actual time=12.766..31834.767 rows=243708 loops=1)
-> HashAggregate (cost=10000000136.00..10000000216.00 rows=8000 width=8) (actual time=6.297..30.760 rows=8000 loops=1)
-> Seq Scan on tmp_nodes (cost=10000000000.00..10000000116.00 rows=8000 width=8) (actual time=0.883..3.108 rows=8000 loops=$
) -> Bitmap Heap Scan on edges_fwd e (cost=51.08..6986.79 rows=1973 width=16) (actual time=3.768..3.958 rows=30 loops=8000) Heap Blocks: exact=8094 -> Bitmap Index Scan on ix_edges_fwd_src_hash (cost=0.00..50.58 rows=1973 width=0) (actual time=2.340..2.340 rows=31 loops=8000) Execution time: 31857.692 ms
postgresql.conf
settings
I set the following configuration options as suggested by PGTune:
max_connections = 10
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 69905kB
min_wal_size = 4GB
max_wal_size = 8GB
max_worker_processes = 6
max_parallel_workers_per_gather = 3
max_parallel_workers = 6