0

In Neo4j, is it faster to run a query against all nodes (AllNodesScan) and then filter on their labels with a WHERE clause, or to run multiple queries with a NodeByLabelScan?


To illustrate, I want all nodes that are labeled with one of the labels in label_list:

label_list = ['label_1', 'label_2', ...]

Which would be faster in an application (this is pseudo-code):

for label in label_list:
    run.query("MATCH (n:{label}) return n")

or

run.query("MATCH (n) WHERE (n:label_1 or n:label_2 or ...)")


EDIT:

Actually, I just realized that the best option might be to run multiple NodeByLabelScan in a single query, with something looking like this:

MATCH (a:label_1)
MATCH (b:label_2)
...
UNWIND [a, b ..] as foo
RETURN foo

Could someone speak to it?

cybersam
  • 63,203
  • 6
  • 53
  • 76
thomas-bc
  • 79
  • 6

1 Answers1

1

Yes, it would be better to run multiple NodeByLabelScans in a single query.

For example:

OPTIONAL MATCH (a:label_1)
WITH COLLECT(a) AS list
OPTIONAL MATCH (b:label_2)
WITH list + COLLECT(b) AS list
OPTIONAL MATCH (c:label_3)
WITH list + COLLECT(c) AS list
UNWIND list AS n
RETURN DISTINCT n

Notes on the query:

  • It uses OPTIONAL MATCH so that the query can proceed even if a wanted label is not found in the DB.
  • It uses multiple aggregation steps to avoid cartesian products (also see this).
  • And it uses UNWIND so that it can useDISTINCT to return distinct nodes (since a node can have multiple labels).
cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Thanks a lot for the additional references! Interestingly, I tried running a simple ```OPTIONAL MATCH (a:label_1) OPTIONAL MATCH (b:label_2) OPTIONAL MATCH (c:label_3) RETURN a, b, c``` and it is incredibly slow (it does not even return a result), I do not understand why, is ```a, b, c``` a cartesian product under the hood? – thomas-bc Apr 17 '20 at 21:27