Have run into an issue with using plain old TinkerGraph to drop a moderate sized number of vertices. In total, there are about 1250 vertices and 2500 edges that will be dropped.
When running the following:
g.V(ids).drop().iterate()
It takes around 20-30 seconds. This seems ridiculous and I have seemingly verified that it is not caused by anything other than the removal of the nodes.
I'm hoping there is some key piece that I am missing or an area I have yet to explore that will help me out here.
The environment is not memory or CPU constrained in any way. I've profiled the code and see the majority of the time spent is in the TinkerVertex.remove
method. This is doubly strange because the creation of these nodes takes less than a second.
I've been able to optimize this a bit by doing a batching and separate threads solution like this one: Improve performance removing TinkerGraph vertices vertices
However, 10-15 seconds is still too long as I'm hoping to have this be a synchronous operation.
I've considered following something like this but that feels like overkill for dropping less than 5k elements...
To note, the size of the graph is around 110k vertices and 150k edges.
I've tried to profile the gremlin query but it seems that you can't profile through the JVM using:
g.V(ids).drop().iterate().profile()
I've tried various ways of writing the query for profiling but was unable to get it to work.
I'm hoping there is just something I'm missing that will help get this resolved.