0

I am looking at the LDBC benchmark which has contributions from Neo4j and TigerGraph. I want to understand how entries are ingested to measure performance.

Here are two example entries from "Person_likes_Post".

{"creationDate":1296583977045,"deletionDate":1577664000000,"explicitlyDeleted":false,"PersonId":13194139533355,"PostId":412316861128}
{"creationDate":1296750065049,"deletionDate":1296750075058,"explicitlyDeleted":true,"PersonId":13194139533355,"PostId":412316861129}

Does it mean only the edge is deleted when "explicitlyDeleted":true ?

When "explicitlyDeleted":false, does it mean the src node is deleted, dst node is deleted or both?

Link to the benchmark doc:
https://ldbcouncil.org/ldbc_snb_docs/ldbc-snb-specification.pdf Download link to the example LDBC dataset containing these entries:
https://ldbcouncil.org/ldbc_snb_datagen_spark/social-network-sf0.003-bi-composite-merged-fk.zip

(I wanted to tag LDBC but there is no such an option.)

cpchung
  • 774
  • 3
  • 8
  • 23

1 Answers1

1

The explicitlyDeleted attribute indicates whether there is a delete operation that targets specifically the given entity (i.e. a node or edge in the graph). This distinction is needed because the LDBC SNB workloads have cascading deletes where the deletion of an entity may trigger the deletion of other entities.

For example, a Person_likes_Post edge can be deleted due to various explicit delete operations:

  1. an explicit delete operation targeting a single Person_likes_Post edge
  2. an explicit delete operation targeting its source Person
  3. an explicit delete operation targeting its target Post
  4. an explicit delete operation targeting a Forum that contains its target Post
  5. an explicit delete operation targeting a Person whose Album/Wall (which are Forum subtypes) contains its target Post

For the Person_likes_Post edge, the explicitlyDeleted attribute is true in case 1, and false for the other cases.

Note that this attribute is only part of the raw data set. The data sets used for the actual workload executions (Interactive, BI) only contain explicit delete operations, hence they omit this attribute.

Gabor Szarnyas
  • 4,410
  • 3
  • 18
  • 42
  • Thank you for the explanation! For data ingestion to a graph, it is clear how case 1 should be handled( delete the edge `likes` only ). But looking at entries containing `"explicitlyDeleted":false`, what should the system does with all these possibilities? Or I am not looking at the right file to determine what to do? – cpchung Jan 07 '23 at 20:55
  • I just read your answer again: `Note that this attribute is only part of the raw data set. ` So clearly the dataset I downloaded contains only raw dataset. For the actual BI workload data, where can I download some pre-generated small dataset like LDBC0.003? Is this one the right one to use? https://ldbcouncil.org/ldbc_snb_datagen_spark/social-network-sf0.003-bi-parquet.zip – cpchung Jan 07 '23 at 21:10
  • Also, at this point does a small dataset for transactional workload testing isolation exist somewhere, like a pre-generated LDBC 0.003 dataset ? – cpchung Jan 07 '23 at 21:16
  • Not sure whether this is the answer to the question above: for BI(read-only), I should only use the data within `initial_snapshot` with the `dynamic` subfolder all ingested before BI use. For HTAP benchmark, I should use the `deletes` and `inserts` folder – cpchung Jan 07 '23 at 21:27
  • 1
    "Is this one the right one to use?" -- yes. For isolation testing, we have a separate ACID test suite at https://github.com/ldbc/ldbc_acid. For the OLTP workload (SNB Interactive), the data sets are at https://github.com/ldbc/ldbc_snb_interactive_impls/blob/main/snb-interactive-pre-generated-data-sets.md – Gabor Szarnyas Jan 07 '23 at 21:27