3

My use case is a graph of several hundreds of millions of vertices (say 100M to 1B). Each vertex has a set of 10 properties which are basically scores that are computed based on the weights of the vertex's edges and the scores of the adjacent vertices. When adding (or removing) nodes in the graph, the scores of all the vertices potentially need to be recomputed. This doesn't need to be done in real time, and thus this is definitely an OLAP/batch use case. There are also some very simple graph OLTP requirements, which are basically just reading the scores of a given vertex and its adjacent nodes. I am trying to determine whether I should go with either of the following approaches: 1- Giraph: this would imply exporting the whole graph in a file format, loading it into Giraph, and then loading the results back into whatever datastore is used to persist the graph (Neo4J, Neptune, JanusGraph, HBase, RDBMS...). 2- Tinkerpop3's GraphComputer: if I understand correctly, I could run the OLAP graph update algorithm directly on a Tinkerpop3-compatible graph DB (JanusGraph, Neptune, other?), and thus solve both the OLAP and OLTP use case with a single tool, without having to do additional data import/export.

sdht0
  • 500
  • 4
  • 8
Fabien Coppens
  • 273
  • 4
  • 12
  • 1
    After analysis, we've decided to go with JanusGraph and its Tinkerpop compatible implementation. We'll leverage its SparkGraphComputer for OLAP processing. – Fabien Coppens Feb 21 '18 at 18:41

1 Answers1

4

If you are not yet getting the Graph OLAP performance you need or if moving data to Spark is proving slow or cumbersome, I suggest you take a look at AnzoGraph. It was programmed by the same team who built Netezza and ParAccel/Redshift.

AnzoGraph is a from-the-ground-up C/C++ HPC implementation of a massively parallel processing native Graph OLAP (GOLAP) engine - i.e. data warehouse style interactive or batch reporting analytics and aggregation of graph data. It is very high performance and scales linearly on commodity computers, so will handle the data set you mention (you may not even need a cluster for that size data). At time of writing it does not support Tinkerpop/Gremlin which may be a problem for you. It does support SPARQL1.1 as well as RDF* (property graphs support which is not yet part of W3C SPARQL standard) and many additional extension functions/aggregate functions necessary for regular analytics. It also supports inference, named queries, views, various graph algorithms etc

Disclaimer: I work for Cambridge Semantics.

Sean Martin
  • 171
  • 3