8

We have a lot of (n1:EffortUser)-[r1:EFFORT]->(n2:EffortObject) that need to be counted by day and week, i.e. how many EffortObject:Email did a EffortUser SENT. if you have a lot of users and emails that can take quite some time so we would like to parallelize this query.

Right now we are using:

match(n1:EffortUser)-[r1:EFFORT]-(n2:EffortObject) 
where r1.Effort = 'yes' and r1.TimeEvent>='2017-01-01' and r1.TimeEvent<='2017-12-31'
return distinct n1.Name as User, date(datetime(r1.TimeEvent)) as date, count(distinct r1.IdUnique) as count
order by user, date

There seem to be a few options to parallelize / optimize this but all are rather poorly documented.

I did a bit of research and found the following APOC functions but try as I might I cannot get them to work (and I could not find much on Stackoverflow either). which of the below options is best incl. an example using the above sample code? This is driving me nuts. we have 4 cores and 32 GB of memory so this should run pretty fast but I just cannot get it to work.

https://neo4j.com/docs/labs/apoc/current/cypher-execution/

CALL apoc.cypher.runMany('cypher;\nstatements;',{params},{config})

runs each semicolon separated statement and returns summary - currently no schema operations

CALL apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value

executes fragment in parallel batches with the list segments being assigned to _

https://neo4j.com/docs/labs/apoc/current/cypher-execution/running-cypher/

apoc.cypher.mapParallel(fragment :: STRING?, params :: MAP?, list :: LIST? OF ANY?) :: (value :: MAP?)

apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value - executes fragment in parallel batches with the list segments being assigned to _

apoc.cypher.mapParallel2(fragment :: STRING?, params :: MAP?, list :: LIST? OF ANY?, partitions :: INTEGER?, timeout = 10 :: INTEGER?) :: (value :: MAP?)

apoc.cypher.mapParallel2(fragment, params, list-to-parallelize) yield value - executes fragment in parallel batches with the list segments being assigned to _

apoc.cypher.parallel(fragment :: STRING?, params :: MAP?, parallelizeOn :: STRING?) :: (value :: MAP?)

apoc.cypher.parallel2(fragment :: STRING?, params :: MAP?, parallelizeOn :: STRING?) :: (value :: MAP?)

apoc.cypher.runMany(cypher :: STRING?, params :: MAP?, config = {} :: MAP?) :: (row :: INTEGER?, result :: MAP?)

apoc.cypher.runMany('cypher;\nstatements;',{params},[{statistics:true,timeout:10}]) - runs each semicolon separated statement and returns summary - currently no schema operations

James Z
  • 12,209
  • 10
  • 24
  • 44
  • 1
    In case you find this interested, discussion and solution developed on this thread here https://community.neo4j.com/t/how-best-to-do-parallel-processing/13342 – Christian Bartens Jan 06 '20 at 23:19

0 Answers0