1

I am using redisgraph with a custom implementation of ioredis. The query runs 3 to 6 seconds on a database that has millions of nodes. It basically filters (b:brand) by different relationship counts by adding the following match and where multiple times on different nodes.

(:brand) - 1mil nodes
(:w) - 20mil nodes
(:e) - 10mil nodes
// matching b before this codeblock
MATCH (b)-[:r1]->(p:p)<-[:r2]-(w:w)
WHERE w.deleted IS NULL
WITH count(DISTINCT w) as count, b
WHERE  count >= 0   AND count <= 10

The full query would look like this.

MATCH (b:brand)
WHERE  b.deleted IS NULL
MATCH (b)-[:r1]->(p:p)<-[:r2]-(w:w)
WHERE w.deleted IS NULL
WITH count(DISTINCT w) as count, b
WHERE  count >= 0   AND count <= 10
MATCH (c)-[:r3]->(d:d)<-[:r4]-(e:e)
WHERE e.deleted IS NULL
WITH count(DISTINCT e) as count, b
WHERE  count >= 0   AND count <= 10
WITH b ORDER by b.name asc
WITH count(b) as totalCount, collect({id: b.id)[$cursor..($cursor+$limit)] AS brands
RETURN brands, totalCount

How can I optimize this query as it's really slow?

kakakakakakakk
  • 467
  • 10
  • 31

1 Answers1

2

A few thoughts:

  • Property lookups are expensive; is there a way you can get around all the .deleted checks?
  • If possible, can you avoid naming r1, r2, etc.? It's faster when it doesn't have to check the relationship type.
  • You're essentially traversing the entire graph several times. If the paths b-->p<--w and c-->d<--e don't overlap, you can include them both in the match statement, separated by a comma, and aggregate both counts at once
  • I don't know if it'll help much, but you don't need to name p and d since you never refer to them
  • This is a very small improvement, but I don't see a reason to check count >= 0

Also, I'm sure you have your reasons, but why does the c-->d<--e path matter? This would make more sense to me if it were b-->d<--e to mirror the first portion.

EDIT/UPDATE: A few things I said need clarification:

First bullet: The fastest lookup is on a node label; up to 4 labels are essentially O(0). (Well, for anchor nodes, it's slower for downstream nodes.) The second-fastest lookup is on an INDEXED property. My comment above assumed UNINDEXED lookups.

Second bullet: I think I was just wrong here. Relationships are stored as doubly-linked lists grouped by relationship type. Therefore, always specify relationship type for better performance. Similarly, always specify direction.

Third bullet: What I said is generally correct, HOWEVER beware of Cartesian joins when you have two MATCH statements separated by a comma. In general, you would only use that structure when you have a common element, like you want directors, actors, and cinematographers all connected to a movie. Still, no overlap between these paths.

Vincent Rupp
  • 617
  • 5
  • 13
  • the `count >= 0` is just an example, I have a nestjs graphql api that requires data filtering, and I need to implement filtering an entity by another entity (count). Thanks for the response, will try to improve them – kakakakakakakk Oct 24 '22 at 08:35