32

I'm trying to visualize a really huge network (3M nodes and 13M edges) stored in a database. For real-time interactivity, I plan to show only a portion of the graph based on user queries and expand it on demand. For instance, when a user clicks a node, I expand its neighborhood. (This is called "Search, Show Context, Expand on Demand" on this paper).

I have looked into several visualization tools, including Gephi, D3, etc. They take a text file as input, but I don't have any idea how they can connect a database and update the graph based on users' interaction.

The linked paper implemented a system like that, but they didn't describe the tools they were using.

How can I visualize such data with above criteria?

user4157124
  • 2,809
  • 13
  • 27
  • 42
Yang
  • 7,712
  • 9
  • 48
  • 65

1 Answers1

34

There are several solutions out there, but basically every one is using the same approach:

  1. create layer on top of your source to let you query at high level
  2. create a front end layer to talk with the level explained above
  3. use the visualization tool you want

As miro marchi pointed, there are several solutions to achieve this goal, some of them locked to particular data sources others with much more freedom but that would require some coding skills.

Datasource

I would start with the choice of the source type: from the type of data probably I would choice either Neo4J, Titan or OrientDB (if you fancy something more exotic with some sort of flexibility). All of them offer a JSON REST API, the former with a proprietary system and language (Cypher) and the other two using the Blueprint / Rexster system. Neo4J supports the Blueprint stack as well if you like Gremlin over Cypher.

For other solutions, such other NoSQL or SQL db probably you have to code a layer above with the relative REST API, but it will work as well - I wouldn't recommend that for the kind of data you have though.

Now, only the third point is left and here you have several choices.

Generic Viz tools

  • Sigma.js it's a free and open source tool for graph visualization quite nice. Linkurious is using a fork version of it as far as I know in their product.

  • Keylines it's a commercial graph visualization tool, with advanced stylings, analytics and layouts, and they provide copy/paste demos if you are using Neo4J or Titan. It is not free, but it does support even older browsers - IE7 onwards...

  • VivaGraph it's another free and open source tool for graph visualization tool - but it has a smaller community compared to SigmaJS.

  • D3.js it's the factotum for data visualization, you can do basically every kind of visualization based on that, but the learning curve is quite steep.

  • Gephi is another free and open source desktop solution, you have to use an external plugin with that probably but it does support most of the formats out there - graphML, CSV, Neo4J, etc...

Vendor specific

  • Linkurious it's a commercial Neo4J specific complete tool to search/investigate data.

  • Neo4J web-admin console - even if it's basic they've improved a lot with the newer version 2.x.x, based on D3.js.

There are also other solutions that I probably forgot to mention, but the ones above should offer a good variety.

Other nodes

The JS tools above will visualize well up to 1500/2000 nodes at once, due to JS limits.
If you want to visualize bigger stuff - while expanding - I would to recommend desktop solutions such Gephi.

Disclaimer

I'm part of the the Keylines dev team.

Community
  • 1
  • 1
MarcoL
  • 9,829
  • 3
  • 37
  • 50
  • Yang, In the case of D3.js, there are readily available examples of displaying and hiding neighborhoods based on user interaction. I don't see D3.js as difficult to learn, by the way. It's an odd tool, though, in that using it involves dealing with HTML, SVG, Javascript, and D3's unusual conceptual model (which is convenient once grasped). Knowing a little of each tool is enough to get started, though, and there are good introductions in print and on the web, as well as many code examples. Dealing with a large number of nodes obviously introduces additional issues that MarcoCI addresses. – Mars Sep 14 '14 at 15:18