@Michael, I am happy that you step in as you definitely know more than me on this :) . I am on a learning journey at this point. At your request here is one of the paper that inspired my understanding:
arxiv.org/abs/1801.02911 (SPARQL querying of Property Graphs using
Gremlin Traversals)
I quote them
"We present a comprehensive empirical evaluation of Gremlinator and
demonstrate its validity and applicability by executing SPARQL queries
on top of the leading graph stores Neo4J, Sparksee and Apache
TinkerGraph and compare the performance with the RDF stores Virtuoso,
4Store and JenaTDB. Our evaluation demonstrates the substantial
performance gain obtained by the Gremlin counterparts of the SPARQL
queries, especially for star-shaped and complex queries."
They explain however that things depends somehow on the type of queries.
Or as another answer put that in stack overflow Comparison of Relational Databases and Graph Databases would also help understand the issue between Set and path. My understanding is that TripleStore works with Set too. This being said i am definitely not aware of all the optimization technics implemented in TripleStore lately, and i saw several papers explaining technics to significantly prune set join operation.
On distribution it is more a guts feelings. For instance, doing join operation in a distributed fashion sounds very but very expensive to me. I don't have the papers and my research is not exhaustive on the matters. But from what I have red and I will have to dig in my Evernote :) to back it, that's the fundamental problem with distribution. Automated smart sharding here seems not to help alleviate the issue.
@Michael this a very but very complex subject. I'm definitively on the journey and that's why i am helping myself with stackoverflow to guide my research. You probably have an idea of as to why. So feel free to provides with pointers indeed.
This being said, I am not saying that there is a problem with RDF and that Property-Graph are better. I am saying that somehow, when it comes to graph traversal, there are ways of implementing a backend that makes this fast. The data model is not the issue here, the data structure used to support the traversal is the issue. The second thing that i am saying is that, it seems that the choice of the query language influence how the "traversal" is performed and hence the data structure that is used to back the data model.
That's my understanding so far, and yes I do understand that there are a lot of other factor at play, and feel free to enumerate some of them to guide my journey.
In short my question comes down to, is it possible to have RDF stores backed by a so-called Native Graph Storage and then Implement Sparql in term of Traversal steps rather than joins over set as per its algebra ? Wouldn't that makes things a bit faster. It seems to be that this is somewhat the approach taken by https://github.com/graknlabs/grakn which is primarily backed by janusGraph for a graph like storage. Although it is not RDF, Graql is the same Idea as having RDFS++ + Sparql. They claim to just do it better, for which i have my reservation, but that's not the fundamental question of this thread. The bottom line is they back knowledge representation by the information retrieval (path traversal) and the accompanying storage approach that Property-Graph championed. Let me be clear on this, I am not saying that the graph native storage is the property of property graph. It is just in my mind a storage approach optimized to store Graph Structure where the information retrieval involve (path) traversal: https://docs.janusgraph.org/latest/data-model.html.