Problem
We have a graph in which locations are connected by services. Services that have common key-values are connected by service paths. We want to find all the service paths we can use to get from Location A to Location Z.
The following query matches services that go directly from A to Z and service paths that take one hop to get from A to Z:
MATCH p=(origin:location)-[:route]->(:service)-[:service_path*0..1]->
(:service)-[:route]->(destination:location)
WHERE origin.name='A' AND destination.name='Z'
RETURN p;
and runs fine.
But if we expand the search to service paths that may take two hops between A and Z:
MATCH p=(origin:location)-[:route]->(:service)-[:service_path*0..2]->
(:service)-[:route]->(destination:location)
WHERE origin.name='A' AND destination.name='Z'
RETURN p;
the query runs forever.
It never times out or crashes the server - it just runs continuously.
However, if we make the variable-length part of the query bidirectional:
MATCH p=(origin:location)-[:route]->(:service)-[:service_path*0..2]-
(:service)-[:route]->(destination:location)
WHERE origin.name='A' AND destination.name='Z'
RETURN p;
The same query that ran forever now executes instantaneously (~30ms on a dev database with default Postgres config).
More info
- Behavior in different versions
This problem occurs in AgensGraph 2.2dev (cloned from GitHub). In Agens 2.1.0 , the first query -[:service_path*0..1]->
still works fine, but the broken query -[:service_path*0..2]->
AND the version that works in 2.2dev, -[:service_path*0..2]-
result in an error:
ERROR: btree index keys must be ordered by attribute
This leads us to believe that the problem is related to this commit, which was included as a bug fix in Agens 2.1.1
fix: Make re-scanning inner scan work
VLE threw "btree index keys must be ordered by attribute" because re-scanning inner scan is not properly done in some cases. A regression test is added to check it.
- Duplicate paths returned, endlessly
In AgensBrowser v1.0, we are able to get results out using LIMIT
on the end of the broken query. The query always returns the maximum number of rows, but the resulting graph is very sparse. Direct paths and paths with one hop show up, but only one longer path appears.
In the result set, the shorter paths are returned in individual records as expected, but first occurrence of a two-hop path is duplicated for the rest of the rows.
If we return some collection along with the paths, like RETURN p, nodes(p) LIMIT 100
, the query again runs infinitely.
(Interestingly, this row duplication also occurs for a slightly different query in which we used the bidirectional fix, but the entire expected graph was returned. This may deserve its own post.)
- The EXPLAIN plans are identical for the one-hop and two-hop queries
We could not compare EXPLAIN ANALYZE (because you can't ANALYZE a query that never finishes running) but the query plans were exactly identical between all of the queries - those that ran and those that didn't.
- Increased logging revealed nothing
We set the logging level for Postgres to DEBUG5, the highest level, and ran the infinitely-running query. The logs showed nothing amiss.