I have a graph database in AWS neptune which has 60M nodes. A simple count query to count all nodes takes ~6-7 minutes.
query is:
MATCH (n)
RETURN count(n) as count
Is 6 minutes to count 60 million nodes normal? what can I do to make it faster?
Explain (debug) result for this query:
Query:
// all nodes count
MATCH (n)
RETURN count(n) as cnt
ID Out #1 Out #2 Name Arguments Mode Units In Units Out Ratio Time (ms) Chunks In Chunks Out Units In (per chunk) Units Out (per chunk) Invocations OutWaitMs Out Blocked Rate [M/s] GCElapsedMs blocksIncGC blocksDecGC progressCount init [ms] done [ms] finalize [ms]
0 1 - SolutionInjection solutions=[{}] - 0 1 0.00 0 0 0 0.00 0.00 0 0 0 NaN
1 2 - DFESubquery subQuery=subQuery1
partitionId=0
details= ====> DFE execution time toPASTModel [micros]=213 accepted [micros]=60 ready [micros]=214 running [micros]=68035522 finished [micros]=0 ===> DFE execution time (measured in DFENode) -> setupTime [ms]=0 -> executionTime [ms]=68038 -> resultReadTime [ms]=0 ====> Original AST: DFEJoinGroupNode[]( children=[ DFEProjectionNode[NONE]( projectedVars=[?cnt], child=DFEAggregationNode[NONE]( groupByVars=[], aggregateExpressions=[ DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)], child=DFEJoinGroupNode[]( children=[ DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901})], opInfo=none), opInfo=none), opInfo=none)], opInfo=none) ====> Preprocessed AST: DFEProjectionNode[NONE]( projectedVars=[?cnt], child=DFEAggregationNode[NONE]( groupByVars=[], aggregateExpressions=[ DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)], child=DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901}), opInfo=(type=NoneOperatorStub, cost=(exp=(empty),wc=(empty)))), opInfo=(type=SubQuery, cost=(exp=(empty),wc=(empty)))) ===> DFE configuration (given) solutionChunkSize=100000 outputQueueSize=20 numComputeCores=3 maxParallelIO=5 numInitialPermits=0 frontiersAsInFilters=true partitionId=0 isExplainRequested=true languageSpecifier=Open_Cypher planVariant=all/BLOCKING readForUpdate=false ====> DFE configuration (reported) numComputeCores=3 numIOThreads=1 numInitialPermits=1 permitsSecured=1732 ===> Top level Statistics & operator histogram ==> Statistics -> 68032185 / 68026188 micros total elapsed (incl. wait / excl. wait) -> 68032 / 68026 millis total elapsed (incl. wait / excl. wait) -> 68 / 68 secs total elapsed (incl. wait / excl. wait) ==> GC Summary -> 40.68ms spent in GC (0.06% of total time) ==> Operator histogram (all times are excluding wait) -> Total Operator #instances: 4 Operator │ Time(ms) │ Time(%) │ rowsIn │ rowsOut │ chunksIn │ chunksOut │ instances │ invocation │ in(M/s) │ out(M/s) │ time/chunkIn(ms) │ time/chunkOut(ms) │ time/invoc(ms) │ GC(ms) │ GC(%) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── pipelineScan │ 67518 │ 99.25% │ 0 │ 60341570 │ 0 │ 1751 │ 1 │ 1 │ 0.00 │ 0.89 │ - │ 38.56 │ 67518 │ 17.67 │ 0.03 merge │ 415 │ 0.61% │ 60341570 │ 60341570 │ 1751 │ 1 │ 1 │ 1 │ 145.53 │ 145.53 │ 0.24 │ 415 │ 415 │ 6.05 │ 1.46 daslReduce │ 93.86 │ 0.14% │ 60341570 │ 1 │ 1 │ 1 │ 1 │ 1 │ 642.91 │ 0.00 │ 93.86 │ 93.86 │ 93.86 │ 16.94 │ 18.05 drain │ 0.06 │ 0.00% │ 1 │ 0 │ 1 │ 0 │ 1 │ 1 │ 0.02 │ 0.00 │ 0.06 │ - │ 0.06 │ 0.02 │ 25.00 - 0 1 0.00 68048 0 1 0.00 1.00 2 0 0 0.00
2 - - TermResolution vars=[?cnt] id2value_opencypher 1 1 1.00 1.00 1 1 1.00 1.00 1 0 0 0.00
subQuery1
ID Out #1 Out #2 Name Arguments Mode Units In Units Out Ratio Time (ms) Chunks In Chunks Out Units In (per chunk) Units Out (per chunk) Invocations OutWaitMs Out Blocked Rate [M/s] GCElapsedMs blocksIncGC blocksDecGC progressCount init [ms] done [ms] finalize [ms]
0 1 - DFEPipelineScan pattern=distinct ?n (?n,rdf:type,?n_label1,neptune:DefaultNamedGraph)
outSchema=[?n]
patternEstimate=78682901 - 0 60341570 0.00 67518 0 1751 0.00 34461.21 1 5.97 0 0.89
1 2 - DFEMergeChunks inSchema=[?n]
outSchema=[?n] - 60341570 60341570 1.00 415 1751 1 34461.21 60341570.00 1 0.01 0 145.53
2 3 - DFEReduce functor=countWithoutNulls(?n)
inSchema=[?n]
outSchema=[?cnt] - 60341570 1 0.00 93.86 1 1 60341570.00 1.00 1 0.01 0 0.00
3 - - DFEDrain inSchema=[?cnt]
outSchema=[?cnt] - 1 0 0.00 0.06 1 0 1.00 0.00 1 0 0 0.00
RUN THIS SNIPPET BELOW FOR BETTER READABILITY
<!DOCTYPE html>
<html>
<body>
<h3>Query:</h3>
<p>// all nodes count<br/>MATCH (n)<br/>RETURN count(n) as cnt<br/></p>
<br/>
<table border="1px">
<thead>
<tr>
<th>ID</th>
<th>Out #1</th>
<th>Out #2</th>
<th>Name</th>
<th>Arguments</th>
<th>Mode</th>
<th>Units In</th>
<th>Units Out</th>
<th>Ratio</th>
<th>Time (ms)</th>
<th>Chunks In</th>
<th>Chunks Out</th>
<th>Units In (per chunk)</th>
<th>Units Out (per chunk)</th>
<th>Invocations</th>
<th>OutWaitMs</th>
<th>Out Blocked</th>
<th>Rate [M/s]</th>
<th>GCElapsedMs</th>
<th>blocksIncGC</th>
<th>blocksDecGC</th>
<th>progressCount</th>
<th>init [ms]</th>
<th> done [ms]</th>
<th>finalize [ms]</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>-</td>
<td>SolutionInjection</td>
<td>solutions=[{}]</td>
<td>-</td>
<td>0</td>
<td>1</td>
<td>0.00</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0.00</td>
<td>0.00</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>NaN</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>-</td>
<td>DFESubquery</td>
<td>subQuery=subQuery1<br>partitionId=0<br>details=
====> DFE execution time
toPASTModel [micros]=213
accepted [micros]=60
ready [micros]=214
running [micros]=68035522
finished [micros]=0
===> DFE execution time (measured in DFENode)
-> setupTime [ms]=0
-> executionTime [ms]=68038
-> resultReadTime [ms]=0
====> Original AST:
DFEJoinGroupNode[](
children=[
DFEProjectionNode[NONE](
projectedVars=[?cnt],
child=DFEAggregationNode[NONE](
groupByVars=[],
aggregateExpressions=[
DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)],
child=DFEJoinGroupNode[](
children=[
DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901})],
opInfo=none),
opInfo=none),
opInfo=none)],
opInfo=none)
====> Preprocessed AST:
DFEProjectionNode[NONE](
projectedVars=[?cnt],
child=DFEAggregationNode[NONE](
groupByVars=[],
aggregateExpressions=[
DFEAggregateExpression(aggregate=DFEBindNode(countWithoutNulls(?n) AS ?cnt), isDistinct=false)],
child=DFEPatternNode((?n, TermId(782U)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], ?n_label1, TermId(526U)[http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph]) . project DISTINCT[?n] {rangeCountEstimate=78682901}),
opInfo=(type=NoneOperatorStub, cost=(exp=(empty),wc=(empty)))),
opInfo=(type=SubQuery, cost=(exp=(empty),wc=(empty))))
===> DFE configuration (given)
solutionChunkSize=100000
outputQueueSize=20
numComputeCores=3
maxParallelIO=5
numInitialPermits=0
frontiersAsInFilters=true
partitionId=0
isExplainRequested=true
languageSpecifier=Open_Cypher
planVariant=all/BLOCKING
readForUpdate=false
====> DFE configuration (reported)
numComputeCores=3
numIOThreads=1
numInitialPermits=1
permitsSecured=1732
===> Top level Statistics & operator histogram
==> Statistics
-> 68032185 / 68026188 micros total elapsed (incl. wait / excl. wait)
-> 68032 / 68026 millis total elapsed (incl. wait / excl. wait)
-> 68 / 68 secs total elapsed (incl. wait / excl. wait)
==> GC Summary
-> 40.68ms spent in GC (0.06% of total time)
==> Operator histogram (all times are excluding wait)
-> Total Operator #instances: 4
Operator │ Time(ms) │ Time(%) │ rowsIn │ rowsOut │ chunksIn │ chunksOut │ instances │ invocation │ in(M/s) │ out(M/s) │ time/chunkIn(ms) │ time/chunkOut(ms) │ time/invoc(ms) │ GC(ms) │ GC(%)
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
pipelineScan │ 67518 │ 99.25% │ 0 │ 60341570 │ 0 │ 1751 │ 1 │ 1 │ 0.00 │ 0.89 │ - │ 38.56 │ 67518 │ 17.67 │ 0.03
merge │ 415 │ 0.61% │ 60341570 │ 60341570 │ 1751 │ 1 │ 1 │ 1 │ 145.53 │ 145.53 │ 0.24 │ 415 │ 415 │ 6.05 │ 1.46
daslReduce │ 93.86 │ 0.14% │ 60341570 │ 1 │ 1 │ 1 │ 1 │ 1 │ 642.91 │ 0.00 │ 93.86 │ 93.86 │ 93.86 │ 16.94 │ 18.05
drain │ 0.06 │ 0.00% │ 1 │ 0 │ 1 │ 0 │ 1 │ 1 │ 0.02 │ 0.00 │ 0.06 │ - │ 0.06 │ 0.02 │ 25.00</td>
<td>-</td>
<td>0</td>
<td>1</td>
<td>0.00</td>
<td>68048</td>
<td>0</td>
<td>1</td>
<td>0.00</td>
<td>1.00</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>0.00</td>
</tr>
<tr>
<td>2</td>
<td>-</td>
<td>-</td>
<td>TermResolution</td>
<td>vars=[?cnt]</td>
<td>id2value_opencypher</td>
<td>1</td>
<td>1</td>
<td>1.00</td>
<td>1.00</td>
<td>1</td>
<td>1</td>
<td>1.00</td>
<td>1.00</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0.00</td>
</tr>
</tbody>
</table>
<h3>subQuery1</h3>
<table border="1px">
<thead>
<tr>
<th>ID</th>
<th>Out #1</th>
<th>Out #2</th>
<th>Name</th>
<th>Arguments</th>
<th>Mode</th>
<th>Units In</th>
<th>Units Out</th>
<th>Ratio</th>
<th>Time (ms)</th>
<th>Chunks In</th>
<th>Chunks Out</th>
<th>Units In (per chunk)</th>
<th>Units Out (per chunk)</th>
<th>Invocations</th>
<th>OutWaitMs</th>
<th>Out Blocked</th>
<th>Rate [M/s]</th>
<th>GCElapsedMs</th>
<th>blocksIncGC</th>
<th>blocksDecGC</th>
<th>progressCount</th>
<th>init [ms]</th>
<th> done [ms]</th>
<th>finalize [ms]</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>-</td>
<td>DFEPipelineScan</td>
<td>pattern=distinct ?n (?n,rdf:type,?n_label1,neptune:DefaultNamedGraph)<br>outSchema=[?n]<br>patternEstimate=78682901</td>
<td>-</td>
<td>0</td>
<td>60341570</td>
<td>0.00</td>
<td>67518</td>
<td>0</td>
<td>1751</td>
<td>0.00</td>
<td>34461.21</td>
<td>1</td>
<td>5.97</td>
<td>0</td>
<td>0.89</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>-</td>
<td>DFEMergeChunks</td>
<td>inSchema=[?n]<br>outSchema=[?n]</td>
<td>-</td>
<td>60341570</td>
<td>60341570</td>
<td>1.00</td>
<td>415</td>
<td>1751</td>
<td>1</td>
<td>34461.21</td>
<td>60341570.00</td>
<td>1</td>
<td>0.01</td>
<td>0</td>
<td>145.53</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>-</td>
<td>DFEReduce</td>
<td>functor=countWithoutNulls(?n)<br>inSchema=[?n]<br>outSchema=[?cnt]</td>
<td>-</td>
<td>60341570</td>
<td>1</td>
<td>0.00</td>
<td>93.86</td>
<td>1</td>
<td>1</td>
<td>60341570.00</td>
<td>1.00</td>
<td>1</td>
<td>0.01</td>
<td>0</td>
<td>0.00</td>
</tr>
<tr>
<td>3</td>
<td>-</td>
<td>-</td>
<td>DFEDrain</td>
<td>inSchema=[?cnt]<br>outSchema=[?cnt]</td>
<td>-</td>
<td>1</td>
<td>0</td>
<td>0.00</td>
<td>0.06</td>
<td>1</td>
<td>0</td>
<td>1.00</td>
<td>0.00</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0.00</td>
</tr>
</tbody>
</table>
</body>
</html>