This is a somewhat abstract and general question. I'm interested in the inherent (as well as implementation-specific) properties of different approaches to persist unstructured data with both lots of internal references (graph-like) and lots of properties (JSON-like).
Since a graph is a superset of a tree, you can look at graph DBs (e.g. Neo4j) as a superset of document DBs (e.g. MongoDB). That is, a graph DB provides all the functionality of a document DB plus additionally also allows loops or has a native pointer type so you don't have to dereference foreign-keys/ids manually. So is there some tipping point that you reach when adding more references to your objects/resources where you're better off with a graph DB but were previously better off with a document store? Are there advantages to document DBs (storage space, performance?) or should you just always go with a graph DB just in case you'll need more references in the future?
Similarly, how do graph DBs and triplestores (e.g. RDF stores) compare? Graph DBs (where nodes and edges have properties) seem to be a superset of the simple triplestores. So for what problems (if any) perform triplestores actually better then, say Neo4j? (One advantage of RDF stores is that there is a standardized query language – SPARQL – although there seem to be a lot of people that don't like SPARQL and thus would call it a disadvantage.)
I guess my question is: The graph model (with properties) seems to be able to neatly express all kinds of data, what is the catch when you enter reality? I suppose the catch of graph DBs is performance, so I'd love to see some numbers or rules of thumb on what kind of slowdowns to expect when loading, querying and modifying data as well as memory, and persistent storage requirements (compared to document and triple stores). Also what about horizontal scalability? I got the impression that there the playing field is quite level.
Do you think it is possible that graphs with their expressibility will become the new default storage model for projects that have not super-large data, or are we doomed for a decade of Polyglot Persistence with RDBMS, JSON stores and Graph DBs living along each other that have to be integrated with even more glue code?