I have files those contain thousands of rows where the size of csv files are 500mb to 3.1 gb.i have firstly done bulk import it took few minutes to loads all data in graph DB. now for my project purpose, I need to upload data by regular basis. so I have written a python script using neo4j bolt driver where all regular node creates, node update, node delete performs. Creating a relationship from files also works for the small size of data(prototype). The problem occurs when I am going to create relations from large files. Though parallelism works, it gets very slow. my CPU 32 core is fully used. I have checked it through the HTOP. and for batch size 100-1000 the core is properly used. I have tried 10000-100000 batch size, in that case, parallelism does not work. here is my query code for creating load csv
"""CALL apoc.periodic.iterate('
load csv with headers from "file:///x.csv" AS row return row
','
MERGE (p1:A {ID: row.A})
MERGE (p2:B {ID: row.B})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.RELATIONSHIP, {}, p2) YIELD rel return rel
',{batchSize:10000, iterateList:true, parallel:true})"""
it works totally fine for a small amount of data. but it gets very slow when it deals with big size of data. for creating 10 relations it took 39sec rough. Is merge operation is inefficient at my case or I am missing some tricks here. kindly help me to solve. I am working at EC2 instance where Ram size is 240G.I a have tried warmup.run it tuned at 192G but no significant change has been observed