DSE graph more threads result in slow response times

Question

I have asked this question before. But asking again with a concrete example.

So I have DSE graph running locally on my Mac. I have the simplest possible creation of a vertex and below is the traversal.

g.addV("company").property("id", companyId)
.property("name", "company_" + companyId)
.property(VertexProperty.Cardinality.list, "domainurls", "test.com", "anothertest.com")
.next();

Now I am using Java TinkerPop3 API to make calls. And I have a DseSession got this way.

dseCluster = DseCluster.builder()
        .addContactPoints(contactPoints)
        .withGraphOptions(new GraphOptions().setGraphName("profilex_dev"))
        .build();
dseSession = dseCluster.connect();
DseGraph.traversal(dseSession)

Am re-using this one instance of GraphTraversalSource in a multi-threaded application. My observation, the more the number of threads, the slower the response times are.

I measured using Java Microbenchmarking Harness and below is roughly what I found

10 Threads - 6 ms
50 Threads - 34 ms
200 Threads - 146 ms.

So my question is - Is there a way to optimize this to run faster - any pooling options that need to be set etc.,. In my case, there is much more than a company creation happening and more graph mutations/queries (around 10 such traversals) that overall response times as the number of threads go high becomes sub-optimal.

Note that the above response times are for simple graph queries as well. So even simple reads are slower, as threads go up. (And of-course very good when the number of threads is less).

Can you add the code of mutations/queries? Sometimes the performance problem could be solved there... — Alex Ott, Mar 12 '18 at 15:56
Maybe using the traversal source this way makes things slower, you could try to switch to GraphStatements and use the method `DseGraph.statementFromTraversal()` instead of iterating the traversal directly, and execute statements through the session. — newkek, Mar 14 '18 at 16:01
If this does not change then you would want to check the inFlight requests from the driver (https://docs.datastax.com/en/developer/java-driver-dse/1.5/manual/pooling/#monitoring-and-tuning-the-pool). If the inFlight goes up when increasing the number of threads, it means ultimately it's a DseGraph server side performance issue that the driver can't really work around. One solution is to batch inserts in the same traversal, like `g.addV().property().addV().property().....` — newkek, Mar 14 '18 at 16:02
@newkek : Sorry for the delay, Monitoring the inflight requests did not help. The inflight requests is always not more than 200 threads (200 was what I was running) and max load = 1024. Note that this is the case even I just get a single vertex `g.V(vertexId)`. So am guessing this is a client side configuration issue most likely — Sathyakumar Seshachalam, Mar 19 '18 at 09:37
@newkek: Also note that If I create a DseSession every time, `DseCluster.connect` and then close this every time. I get a better response times. This I thought is an anti-pattern and I was supposed to re-use a single DseSession instance. — Sathyakumar Seshachalam, Mar 19 '18 at 10:48
When do you close and open the DseSession? Indeed this is an anti-pattern... we recommend using the DseSession as a long lived object. Do you have a code sample you could provide that would reproduce the issue? — newkek, Mar 19 '18 at 18:39
@newkek: Here it is https://www.dropbox.com/s/tqbdqv2lv0jhqqv/DSEVertexQueryPerformance.java?dl=0 — Sathyakumar Seshachalam, Mar 20 '18 at 11:54

score 0 · Answer 1 · answered Mar 22 '18 at 20:46

Not sure if that should be posted as an "answer" but the formatting is easier like that than in a comment. Thanks for providing a full test class, it was helpful to debug and experiment.

Been looking at your test implementation and indeed I was able to notice a decrease in throughput as the concurrency level increased.

I believe that what you are seeing is a side effect of your local server node not being used initially and the server caches being cold and not active enough. You need bigger warmup phases for your local node to be able to start responding fast enough and see how under higher concurrency the individual requests times do not increase.

I ran your tests (just the single session ones), when my system is under low usage and when I start your tests the results look exactly similar to yours.

However, if putting some additional tests before executing the tests with 10 - 20 - 50 requests, like so:

    dseVertexQueryPerformance.testSingleSession(3000);
    dseVertexQueryPerformance.testSingleSession(6000);
    dseVertexQueryPerformance.testSingleSession(6000);
    dseVertexQueryPerformance.testSingleSession(6000);
    dseVertexQueryPerformance.testSingleSession(6000);
    dseVertexQueryPerformance.testSingleSession(9000);
    dseVertexQueryPerformance.testSingleSession(9000);
    dseVertexQueryPerformance.testSingleSession(9000);
    dseVertexQueryPerformance.testSingleSession(10);
    dseVertexQueryPerformance.testSingleSession(20);
    dseVertexQueryPerformance.testSingleSession(50);
    dseVertexQueryPerformance.testSingleSession(100);
    dseVertexQueryPerformance.testSingleSession(200);

I eventually get results like:

End Test ::SingleSession, Average Time: 2.6, Total execution time: 5.891593 ms
For 10 threads, ::SingleSession took 2.6
End Test ::SingleSession, Average Time: 4.4, Total execution time: 7.830533 ms
For 20 threads, ::SingleSession took 4.4
End Test ::SingleSession, Average Time: 1.86, Total execution time: 20.378055 ms
For 50 threads, ::SingleSession took 1.86
End Test ::SingleSession, Average Time: 1.98, Total execution time: 47.487505 ms
For 100 threads, ::SingleSession took 1.98
End Test ::SingleSession, Average Time: 2.295, Total execution time: 92.793991 ms
For 200 threads, ::SingleSession took 2.295

We can effectively see that the contention was due to the fact that DSE Graph, or the system overall, was not warm enough.

Now, looking back at those longer runs I mentioned above, I observe some more contention happening or more work when using the connected traversal source.

I would recommend instead of iterating the traversal source directly, to create a GraphStatement out of a traversal, as explained here: https://docs.datastax.com/en/developer/java-driver-dse/1.5/manual/tinkerpop/#data-stax-drivers-execution-compatibility

So in your tests, I have changed the getAVertex(GraphTraversalSource g) to:

public class DSEVertexQueryPerformance {
    ...

    private GraphTraversalSource traversalSource = DseGraph.traversal();

    ....

    public long getAVertex(DseSession dseSession) {
        Instant begin = Instant.now();
        dseSession.executeGraph(DseGraph.statementFromTraversal(
            traversalSource.V(vertexId))
        );
        return Duration.between(begin, Instant.now()).toMillis();
    }

By changing this method I was able to go from a ~1900 ms total execution time for, say, 9000 requests (dseVertexQueryPerformance.testSingleSession(9000);) with your current g.V().next() version, to a total execution time of ~1400 ms using the statementFromTraversal() method.

Finally, I'd also recommend using asynchronous query execution methods (DseSession.executeGraphAsync()) as it would allow you parallelize all your requests without needing thread pools on the client side, which will eventually put less pressure on the client application.

Thanks for the detailed reply. I do have a question though. Is it possible to pre-warm a server and buildup the cache before hand. I will continue to see if multiple VMs. One warming up the cache and the other VM doing the regular 10,20,50,100 test will work — Sathyakumar Seshachalam, Mar 23 '18 at 04:18

DSE graph more threads result in slow response times

1 Answers1