4

I'm investigating a possible bug with partition scans using custom vertex IDs in DSE Graph. For some reason, selecting a vertex by its full ID works as expected, but retrieving the whole partition results in a full table scan (i.e. graph scan warning).

Following this vertex label definition: schema.vertexLabel('word_hoard').partitionKey('_partition').clusteringKey('wordhoard_id').create()

g.V().hasLabel('word_hoard').has('_partition', 'localhost').has('wordhoard_id', '60bcaeff-f6e5-11e5-9ce9-00aaaaaaaaaa')

leads to efficient CQL that makes sense:

SELECT * FROM topics_dev.word_hoard_p 
  WHERE "_partition" = 'localhost' AND wordhoard_id = 60bcaeff-f6e5-11e5-9ce9-00aaaaaaaaaa;

g.V().hasLabel('word_hoard').has('_partition', 'localhost')

however, generates CQL that seems uninformed about the partition key:

SELECT "_partition", "wordhoard_id" FROM "topics_dev"."word_hoard_p"
  WHERE "~~vertex_exists" = true

To avoid this unnecessary table scan, I would expect something like:

SELECT * FROM topics_dev.word_hoard_p
  WHERE "_partition" = 'localhost';

This CQL query performs well, but I cannot seem to generate it with a gremlin traversal.

Does anyone have experience with this issue?

Should I approach it differently, or is this a genuine bug in DSE or tinkerpop?

UPDATE 2018-10-30: this issue still exists as of DSE 6.0.4

UPDATE 2019-10-19: a solution is available for testing in the DataStax Labs graph engine (experimental; non-production): https://community.datastax.com/answers/1150/view.html

AliOli
  • 561
  • 7
  • 16
  • Which version of DSE do you use? – jbmusso Nov 23 '17 at 14:55
  • This is on DSE 5.1.3. I'm pretty sure I got the same behavior on 5.1.2 and have never used custom partition keys in earlier versions. – AliOli Nov 23 '17 at 18:29
  • Does the issue happen if you use g.V().has('word_hoard').has('_partition', 'localhost'). I'm looking through JIRAs as well to see if this is a known issue. – jlacefie Nov 27 '17 at 12:32
  • Thanks for looking into this @jlacefie. Yes, the issue happens as well when using `.has('word_hoard')` instead of `.hasLabel('word_hoard')`. Actually, it is a lot worse, because this way all defined vertex labels are scanned. Were you able to reproduce the issue? I'm testing this on a DSE 5.1.3 install with default configuration. – AliOli Nov 27 '17 at 16:55

0 Answers0