how to optimize cypher query using optional match clause in a big graph database?

Question

when I use an optional match clause in a simple cypher query like the following:

MATCH ()-[R0:relationshipclazz0]-() 
OPTIONAL MATCH ()-[R0:relationshipclazz0]->(N0:entityclazz0)
WITH distinct R0, R0.att0 as AR0att0, N0  
WITH ID(R0) as i,   R0.att0 as O1,  (N0.att0) as O2, R0 
RETURN  O1, O2, count(i) 
ORDER BY  O1, O2

this query take 381 seconds in a graph database with 50 000 relationships and 6000 nodes

please have you any idea how can I optimize this query knowing that I have to optional match because I have null values that I want to descover in my database because using only match clause I didn't get null values

Thanks in advance

Are there any other nodes with labels you can include in the initial match? Relationship lookups like this tend to be inefficient, much easier if you can find starting nodes by label. You also seem to be performing projections that are never used later, those can probably be cleaned up. And just to make sure we have the right idea, can you provide a verbal description of what you are trying to do, and your desired output? — InverseFalcon, Mar 13 '17 at 21:56
no in this casr I don't have node to include in the first match clause in fact I have two types of queries: 1 queriens start by entities(node).2queries start by relationships. because I want to replace SQL join query in the case of relational database by cypher queries in NEO4J and in the case of SQL queries we can start by a relation class wich correspond to a relationship in a graph database. so fo purpuse of my thesis research I have to make this queries. — Marwa EL Abri, Mar 13 '17 at 22:17
I find in the web that I can add relationships indexes in neo4j using apoc procedure but I didn't understand very well how to use or to implement them — Marwa EL Abri, Mar 13 '17 at 22:17

score 0 · Accepted Answer · answered Mar 13 '17 at 23:26

0

The big problem with these kinds of queries, looking up by relationships instead of by using starting nodes or node labels, is that there isn't an efficient means of looking up relationships by type in Neo4j at this point. Currently, your query must look at all nodes in the database and check all of their relationships to find the ones of the correct type.

You can use APOC Procedures (use the correct APOC version based on your Neo4j version) to add manual indexes on your relationships. This does require some non-null property on all your relationships for the index lookup to work (we can probably use att0, provided that it's present on all your :relationshipclazz0 relationships).

We first need to manually add all of these relationships to your index:

MATCH ()-[r:relationshipclazz0]-() 
CALL apoc.index.addRelationship(r,['att0'])
RETURN count(*)

Now we can query from the index:

CALL apoc.index.relationships('relationshipclazz0','att0:*') YIELD rel as R0
OPTIONAL MATCH ()-[R0]->(N0:entityclazz0)
WITH R0, R0.att0 as O1,  N0.att0 as O2
RETURN  O1, O2, count(R0) 
ORDER BY  O1, O2

answered Mar 13 '17 at 23:26

InverseFalcon

29,576
4
38
51

the problem that I can not install APOC. I use NEO4J 3.1.0 and I install 3.1.0.3 apoc version like it is montioned in the web and I place the jar file in "C:\Program Files\Neo4j CE 3.1.0\plugins" folder than I restart my server and I test with this query "CALL apoc.meta.graph" but I get an error: There is no procedure with the name `apoc.meta.graph` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed. please can you help me to be able to use apoc procedure. thanks in advance – Marwa EL Abri Mar 13 '17 at 23:49
1

See http://stackoverflow.com/questions/42740355/how-to-install-apoc-for-neo4j/42741433#42741433 – cybersam Mar 13 '17 at 23:52
than you it go :) – Marwa EL Abri Mar 14 '17 at 00:17
Note that the index lookup will only work for relationships that have the `att0` property. If you have relationships that lack this property, they won't be picked up by the index. You can check to see how many relationships don't have this by executing `match ()-[R0:relationshipclazz0]-() where r0.att0 is null return count(R0)` – InverseFalcon Mar 14 '17 at 00:39
No all R0:relationshipclazz0 have the attribute att0 even if I execute match ()-[R0:relationshipclazz0]-() where r0.att0 is null return count(R0) I get 0 and I don't have null values in att0 – Marwa EL Abri Mar 14 '17 at 00:53
I'm sorry I verify the results and both queries give me same counts but it always takes a long time when the database is large I have 50.000relationships in my Dbase. so it is not very big and when I execute this query: CALL apoc.index.relationships('relationshipclazz0','att0:*') YIELD rel as R0 OPTIONAL MATCH ()-[R0]->(N0:entityclazz0) WITH distinct R0, R0.att0 as AR0att0, N0 WITH ID(R0) as i, R0.att0 as O1, (N0.att0) as O2, R0 RETURN O1, O2, count(i) ORDER BY O1, O2 this query took 265798 ms have you any idea how optimize this cypher query cause even APOC index don't resolve problem – Marwa EL Abri Mar 16 '17 at 20:03

how to optimize cypher query using optional match clause in a big graph database?

1 Answers1