How to tell cypher where to make nodehashjoin

Question

I have very interesting observation on a certain types of query.

My starting query is:

PROFILE 
MATCH (cs:Movie { id: 'm:H01016' }) WITH cs 
MATCH (ms:Actor { id: 'a:111' }) WITH cs,ms  
MATCH p=((cs)--(x0)--(x1)--(x2)--(ms))   
RETURN EXTRACT(n IN nodes(p) | n)  SKIP 0 LIMIT 24

And with my data it executes for 141 ms

With slight modification of this query

PROFILE 
MATCH (cs:Movie { id: 'm:H01016' }) WITH cs 
MATCH (ms:Actor { id: 'a:111' }) WITH cs,ms  
MATCH p=((cs)--(x0:Director)--(x1)--(x2)--(ms))   
RETURN EXTRACT(n IN nodes(p) | n)  SKIP 0 LIMIT 24

It starts to execute for 7-8 seconds. The only difference I see is where the nodehashjoin happens.

First execution plan is:

And second one looks like:

The difference is quite obvious. On first query we have 2 expands on either side and nodehashjoin happens in the middle, while on second query we have 3 expands from one side, 1 expand on the other and nodehashjoin happens towards the end. These 3 expands on the second query leads to over a million db hits. So is there any way to direct where nodehashjoin must happen?

And here is the expanded version of the slow executing query. There is nothing strange in it I believe. It's only the nodehashjoin happens on an inappropriate place:

What exactly is in your query where you have written (x0:**Director**) ? I'm asking because that seems not to be valid syntax ... — Tom Geudens, Aug 25 '17 at 05:56
x0:Director. ** opens bolded section like **xxx**. I didn't know why it didn't work and I didn't notice it and ofc in the comment is working. Perhaps it doesn't work in the code section. — user732456, Aug 25 '17 at 06:55
Cool, thanks. Can you expand the two Filters and the Expand(All) where things go "wrong" (where you suddenly have a million db hits) in your second visual and add that to the question ? The quick answer is that you can not force where the nodehashjoin happens but it would be interesting to see (and try to explain) the why ... — Tom Geudens, Aug 25 '17 at 10:04

score 4 · Accepted Answer · answered Aug 25 '17 at 13:35

4

So, if you want to change the behaviour of the query optimisation somehow, there is actually a trick, which can be used. I do not have your dataset to test it out, but this clause can influence your execution plan. This way you can change an Expand(all) and a filter into an Expand(Into) operator:

with * where true

answered Aug 25 '17 at 13:35

szenyo

522
2
9

Where exactly to include it as I don't see any difference so far? – user732456 Aug 26 '17 at 08:18
Somewhere after your WITHs. Firstly I would try like this and check the profiling after: PROFILE MATCH (cs:Movie { id: 'm:H01016' }) WITH cs WITH * WHERE true MATCH (ms:Actor { id: 'a:111' }) WITH cs,ms MATCH p=((cs)--(x0)--(x1)--(x2)--(ms)) RETURN EXTRACT(n IN nodes(p) | n) SKIP 0 LIMIT 24 – szenyo Aug 26 '17 at 08:21
Indeed query runs much faster now. – user732456 Aug 27 '17 at 13:51
can't understand how this work and I can't find resources either. Is it going to work if there is a set on one end of the route like this PROFILE MATCH (cs:Movie { id: 'm:H01016' }) WITH cs WITH * WHERE true MATCH (ms:Actor) WHERE ms.id in [ 'a:111', 'a:112'] WITH cs,ms MATCH p=((cs)--(x0)--(x1)--(x2)--(ms)) RETURN EXTRACT(n IN nodes(p) | n) SKIP 0 LIMIT 24. – user732456 Aug 28 '17 at 16:12
It should work. It is a trick to put a (meaningless) cypher segment into the query, and the cypher optimiser thinks that it should optimize the query for this with clause. Nothing else. Similar story is here: https://stackoverflow.com/questions/40725765/slow-performance-bulk-updating-relationship-properties-in-neo4j/40726776#40726776 – szenyo Aug 30 '17 at 07:29

How to tell cypher where to make nodehashjoin

1 Answers1