2

Have run into an issue with using plain old TinkerGraph to drop a moderate sized number of vertices. In total, there are about 1250 vertices and 2500 edges that will be dropped.

When running the following:

g.V(ids).drop().iterate()

It takes around 20-30 seconds. This seems ridiculous and I have seemingly verified that it is not caused by anything other than the removal of the nodes.

I'm hoping there is some key piece that I am missing or an area I have yet to explore that will help me out here.

The environment is not memory or CPU constrained in any way. I've profiled the code and see the majority of the time spent is in the TinkerVertex.remove method. This is doubly strange because the creation of these nodes takes less than a second.

I've been able to optimize this a bit by doing a batching and separate threads solution like this one: Improve performance removing TinkerGraph vertices vertices

However, 10-15 seconds is still too long as I'm hoping to have this be a synchronous operation.

I've considered following something like this but that feels like overkill for dropping less than 5k elements...

To note, the size of the graph is around 110k vertices and 150k edges.

I've tried to profile the gremlin query but it seems that you can't profile through the JVM using:

g.V(ids).drop().iterate().profile()

I've tried various ways of writing the query for profiling but was unable to get it to work.

I'm hoping there is just something I'm missing that will help get this resolved.

  • JanusGraph has an inmemory backend that does not depend on TinkerGraph, so it might be interesting to compare performance figures for both on bulk dropping based on a list of vertex ids. – HadoopMarc Jan 21 '23 at 10:31
  • Do you see the same issues from the Gremlin console? Is it possible you are running into GC issues? You can profile the query, just remove the `iterate` before the `profile`. Do keep in mind also that when you drop a node, all incident edges also have to be dropped. This said 15 seconds seems a lot of time. – Kelvin Lawrence Jan 21 '23 at 16:28
  • I added an answer below with a few more ideas and showing some tests I performed. – Kelvin Lawrence Jan 21 '23 at 16:51

1 Answers1

1

As mentioned in comments, it definitely seems unusual that this operation is taking so long, unless the machine being used is very busy performing other tasks. Using my laptop (16GB RAM, modest CPU and other specs) I can drop the air-routes graph (3,747 nodes and 57,660 edges) in milliseconds time from the Gremlin console.

gremlin> Gremlin.version
==>3.6.0

gremlin> g
==>graphtraversalsource[tinkergraph[vertices:3747 edges:57660], standard]

gremlin> g.V().drop().profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[])                                          3747        3747           6.226     7.52
DropStep                                                                                      76.587    92.48
                                            >TOTAL                     -           -          82.813        -

gremlin> g
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]   

I also tried dropping a list of 1000 nodes as follows but still experienced millisecond time.

gremlin> g
tinkergraph[vertices:3747 edges:57660]


gremlin> a=[] ; for (x in (1..1000)) {a << x}
==>null

gremlin> a.size()
==>1000

gremlin> g.V(a).drop().profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1, 2, 3, 4, 5, 6, 7, 8,...                  1000        1000           2.677    13.87
DropStep                                                                                      16.626    86.13
                                            >TOTAL                     -           -          19.304        -

gremlin> g
==>graphtraversalsource[tinkergraph[vertices:2747 edges:9331], standard]   

Perhaps see if you can get a profile from your Java code using a query without iterate (it's not needed as profile is a terminal step). Also check for any unusual GC activity. I would also see if you see this same issue using the Gremlin Console. Something is definitely odd here. If none of these investigations bear fruit perhaps update the question to show the exact Java code you are using.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thank you for your response! I've run more extensive profiling and unfortunately am still unsure of the issue. Here are my findings. ** Methodology ** Ran JDK Mission Control Flight Recorder report over a period of a minute. During that minute, I reproduced the delete of 1250 vertices while taking thread dumps every 10 seconds. **Garbage Collection** No issue with gc, – pro-grammar Jan 23 '23 at 02:22
  • Profiling showed time 99% of time spent in the `TinkerVertex.remove()` method which 97.9% of is `ConcurrentHashMapNode.remove()` method. I have one dump where the it is locked on `replaceNode` , a sub method of `ConcurrentHashMapNode.remove()` that I intend to follow when I next work on this. But otherwise, there is no notability to the thread dumps other than they are in various spot of remove(). I sometimes see: `ElementHelper.hashCode(ElementHelper.java:451)` `at TinkerElement.hashCode(TinkerElement.java:40)` `at ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1110)` – pro-grammar Jan 23 '23 at 03:13
  • As a last effort, I just tried removing the indices that were configured. That took the deletion time from 12500 ms to 5 ms. Is there a good way I can diagnose why an index would cause such a substantial performance loss? I only had 2 indices on Vertex properties. – pro-grammar Jan 23 '23 at 18:02
  • Ah - very interesting. I did not know you were using an index. I can do some more experiments with an index in place and I'll update the current answer if anything else comes to light from that. – Kelvin Lawrence Jan 23 '23 at 18:14
  • Great, also to note, I'm using version 3.3.5. Perhaps there is an inefficiency that was corrected in later versions... – pro-grammar Jan 23 '23 at 18:18
  • OK - well, that is a very old version. TinkerPop has had many releases since then. Just the 3.5.x and 3.6.x lines are even maintained now. Are you able to test on a later version? I have been using 3.6.x for my testing. I'm not sure if anything was fixed and/or changed (regarding indexes or drop) since 3.3.5 but that was released in January 2019. – Kelvin Lawrence Jan 23 '23 at 18:28
  • Did a bunch of testing w/ different environment setups this afternoon with the gremlin console. Tried 3.3.11 (version I'm using), 3.5.4 and 3.6.1 with Java 11 or Java 8 (version I'm using). I found that w/out indexes, deletes of 3k-5k nodes never takes more than 100ms. With indexes, this jumps up to 10-12 seconds for 3k nodes and 16-25 seconds for 5000 nodes. No environments performed significantly different during the testing. This seems slow, I wonder if this is worth a bug report/ feature request? – pro-grammar Jan 25 '23 at 00:45
  • For sure, if you have a solid reproducer please do open a Jira ticket on the Apache TinkerPop project. – Kelvin Lawrence Jan 29 '23 at 00:51