5

I have created a basic implementation of high level client over Neo4J (https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-neo4j) and want to compare its performance with Native neo4j driver (and maybe SpringData too). This way I would be able to determine overhead my library is putting over native driver.

I plan to create an extension of YCSB for Neo4J.

My question is: what should be considered as a basic unit of object to be written into neo4j (should it be a single node or a couple of nodes joined by an edge). What's current practice in Neo4J world. How people benchmarking neo4j performance are doing it.

Dev
  • 13,492
  • 19
  • 81
  • 174
Amresh
  • 478
  • 1
  • 6
  • 28
  • 1
    slightly OT, but i remind myself some articles about graph db benchmarking in general, maybe it would help: https://code.google.com/p/orient/wiki/GraphDBComparison and http://ups.savba.sk/~marek/gbench.html – ulkas Mar 01 '13 at 08:47

3 Answers3

4

There's already been some work for benchmarking Neo4J with Gatling: http://maxdemarzi.com/2013/02/14/neo4j-and-gatling-sitting-in-a-tree-performance-t-e-s-t-ing/

You could maybe adapt it.

Stephane Landelle
  • 6,990
  • 2
  • 23
  • 29
4

See graphdb-benchmarks

The project graphdb-benchmarks is a benchmark between popular graph dataases. Currently the framework supports Titan, OrientDB, Neo4j and Sparksee. The purpose of this benchmark is to examine the performance of each graph database in terms of execution time. The benchmark is composed of four workloads, Clustering, Massive Insertion, Single Insertion and Query Workload. Every workload has been designed to simulate common operations in graph database systems.

Clustering Workload (CW): CW consists of a well-known community detection algorithm for modularity optimization, the Louvain Method. We adapt the algorithm on top of the benchmarked graph databases and employ cache techniques to take advantage of both graph database capabilities and in-memory execution speed. We measure the time the algorithm needs to converge.

Massive Insertion Workload (MIW): Create the graph database and configure it for massive loading, then we populate it with a particular dataset. We measure the time for the creation of the whole graph.

Single Insertion Workload (SIW): Create the graph database and load it with a particular dataset. Every object insertion (node or edge) is committed directly and the graph is constructed incrementally. We measure the insertion time per block, which consists of one thousand edges and the nodes that appear during the insertion of these edges.

Query Workload (QW): Execute three common queries: FindNeighbours (FN): finds the neighbours of all nodes. FindAdjacentNodes (FA): finds the adjacent nodes of all edges. FindShortestPath (FS): finds the shortest path between the first node and 100 randomly picked nodes.

Somnath Muluk
  • 55,015
  • 38
  • 216
  • 226
1

One way to performance-test is to use e.g. http://gatling-tool.org/. There is work underway to create benchmark frameworks at http://ldbc.eu . Otherwise, benchmarking is highly dependent on your domain dataset and the queries you are trying to do. Maybe you could start at https://github.com/neo4j/performance-benchmark and improve on it?

Peter Neubauer
  • 6,311
  • 1
  • 21
  • 24