I'm investigating a possible bug with partition scans using custom vertex IDs in DSE Graph. For some reason, selecting a vertex by its full ID works as expected, but retrieving the whole partition results in a full table scan (i.e. graph scan warning).
Following this vertex label definition:
schema.vertexLabel('word_hoard').partitionKey('_partition').clusteringKey('wordhoard_id').create()
g.V().hasLabel('word_hoard').has('_partition', 'localhost').has('wordhoard_id', '60bcaeff-f6e5-11e5-9ce9-00aaaaaaaaaa')
leads to efficient CQL that makes sense:
SELECT * FROM topics_dev.word_hoard_p
WHERE "_partition" = 'localhost' AND wordhoard_id = 60bcaeff-f6e5-11e5-9ce9-00aaaaaaaaaa;
g.V().hasLabel('word_hoard').has('_partition', 'localhost')
however, generates CQL that seems uninformed about the partition key:
SELECT "_partition", "wordhoard_id" FROM "topics_dev"."word_hoard_p"
WHERE "~~vertex_exists" = true
To avoid this unnecessary table scan, I would expect something like:
SELECT * FROM topics_dev.word_hoard_p
WHERE "_partition" = 'localhost';
This CQL query performs well, but I cannot seem to generate it with a gremlin traversal.
Does anyone have experience with this issue?
Should I approach it differently, or is this a genuine bug in DSE or tinkerpop?
UPDATE 2018-10-30: this issue still exists as of DSE 6.0.4
UPDATE 2019-10-19: a solution is available for testing in the DataStax Labs graph engine (experimental; non-production): https://community.datastax.com/answers/1150/view.html