I have very interesting observation on a certain types of query.
My starting query is:
PROFILE
MATCH (cs:Movie { id: 'm:H01016' }) WITH cs
MATCH (ms:Actor { id: 'a:111' }) WITH cs,ms
MATCH p=((cs)--(x0)--(x1)--(x2)--(ms))
RETURN EXTRACT(n IN nodes(p) | n) SKIP 0 LIMIT 24
And with my data it executes for 141 ms
With slight modification of this query
PROFILE
MATCH (cs:Movie { id: 'm:H01016' }) WITH cs
MATCH (ms:Actor { id: 'a:111' }) WITH cs,ms
MATCH p=((cs)--(x0:Director)--(x1)--(x2)--(ms))
RETURN EXTRACT(n IN nodes(p) | n) SKIP 0 LIMIT 24
It starts to execute for 7-8 seconds. The only difference I see is where the nodehashjoin happens.
First execution plan is:
And second one looks like:
The difference is quite obvious. On first query we have 2 expands on either side and nodehashjoin happens in the middle, while on second query we have 3 expands from one side, 1 expand on the other and nodehashjoin happens towards the end. These 3 expands on the second query leads to over a million db hits. So is there any way to direct where nodehashjoin must happen?
And here is the expanded version of the slow executing query. There is nothing strange in it I believe. It's only the nodehashjoin happens on an inappropriate place: