My data model is
(:Parent {parentId:...})-[:CONTAINS]->(:Child)-[:SUPPORTS]->(:Dummy)
Every Child
has only one Parent
. The Parent.parentId
attribute is unique, i.e. there is a constraint defined as
CREATE CONSTRAINT Parent_parentId_unique IF NOT EXISTS ON (p:Parent) ASSERT p.parentId IS UNIQUE;
I have a user-defined @Procedure
which accepts collection of parentIds
and I want to remove all :SUPPORTS
relationships from all their children.
- When everything is in cypher, the execution of procedure is slow - hundreds milliseconds to seconds.
Result removeRelationshipResult = tx.execute(""
+ "MATCH (p:Parent)-[:CONTAINS]->(c:Child)-[r:SUPPORTS]->(:Dummy)\n"
+ "WHERE p.parentId IN $parentIds\n"
+ "DELETE r",
Map.of("parentIds", parentIds)
);
- When I loop through all relationships programmatically, the execution is fast - under 10 milliseconds (
streamOf
is utility method to convert Iterable to Stream).
RelationshipType CONTAINS = RelationshipType.withName("CONTAINS");
RelationshipType SUPPORTS = RelationshipType.withName("SUPPORTS");
for (Long parentId : parentIds) {
Node parentNode = tx.findNode(Parent.LABEL, "parentId", parentId);
streamOf(parentNode.getRelationships(Direction.OUTGOING, CONTAINS))
.map(rel -> rel.getEndNode())
.flatMap(childNode -> streamOf(childNode.getRelationships(SUPPORTS)))
.forEach(Relationship::delete);
}
The difference happens even on first try when there is no :SUPPORTS
relationship.
Where can be the cause of such difference and how to spot it?
UPDATE: Reaction to @cybersam's answer (too long and unformattable for comment):
I tested your suggestions on sample with 1737 Parent
nodes and 655344 :SUPPORTS
relations, splitted into 61 batches (useful for warmup from point #2 though primary purpose of split was different).
Applying point #1 caused huge performance improvement. The time is now comparable to programmatic implementation. I also tried to change programmatic implementation vice versa, i.e. add filtering to node labels but it did not have significant effect. Actually the time comparations differ for first run (when no relations exist) and second run (when relations from first run are actually deleted). Third and next run are similar to second run. The point #1 clearly answers my question and helped me a lot, thanks!
implementation | first run: deletion time/total procedure time | second run: deletion time/total procedure time [ms] |
---|---|---|
cypher-labels | 79366/131261 | 170283/188783 |
cypher-nolabels | 230/13756 | 1800/17284 |
program-labels | 155/11731 | 2235/19539 |
program-nolabels | 174/11805 | 2079/19111 |