Graph Databases: Retrieving the most complex Relationships using Gremlin

Question

I'm trying to write a Gremlin query to find a list of traversed vertices and edges (with their properties), returning the most complex (i.e. highest count) of a vertex based on the starting vertex.

In other words, I want to retrieve the patients with the most codes, but there is not a direct relationship between Patients and Codes. This is the relationship and direction: Patient->Diagnosis<-Code

Here is my attempt:

g.V().hasLabel('Patient'). 
  outE().inV().
  inE().outV().
  path().
    by(elementMap()).
  order().
    by(count(local), asc).
  tail(2).
  unfold().
  toList()

I wanted this to return patient vertices with their traversed edges/vertices, only the top 2 based on the count of codes returned per patient. This is what I got:

single patient vertex with traversed edges/nodes

Here is sample insert to replicate the same relationships:

g
.addV('pat').property(id, 'p-0')
.addV('pat').property(id, 'p-1')
.addV('pat').property(id, 'p-2')
.addV('diag').property(id, 'd-0')
.addV('diag').property(id, 'd-1')
.addV('diag').property(id, 'd-2')
.addV('code').property(id, 'c-0')
.addV('code').property(id, 'c-1')
.V('p-0').addE('contracted').to(V('d-0'))
.V('p-0').addE('contracted').to(V('d-1'))
.V('p-0').addE('contracted').to(V('d-2'))
.V('p-1').addE('contracted').to(V('d-1'))
.V('p-2').addE('contracted').to(V('d-2'))
.V('c-0').addE('includes').to(V('d-0'))
.V('c-1').addE('includes').to(V('d-0'))
.V('c-1').addE('includes').to(V('d-1'))
.V('c-2').addE('includes').to(V('d-1'))

This is an example of the format I would like to return: I used ".path().by(elementMap()).unfold().toList()" after the vertex and edge steps to get this.

I want the output to be the vertices and edges that will produce a graph like this:

As you can see, out of three patients, I want to return the top 2 most complex patients (based on the number of codes their diagnoses have). I don't want to return the patient with just one code.

If by any chance you can add the `addV` and `addE` steps to the question that build a small sample graph, that will make providing a tested answer easier. What is your current query producing and what specifically would you like to change? — Kelvin Lawrence, Apr 05 '23 at 15:39
You will find an example of using `addV` and `addE` steps to create a sample graph in the answer to [this question](https://stackoverflow.com/questions/75734493/using-project-valuemap-in-combination-with-where/75737963#75737963) — Kelvin Lawrence, Apr 05 '23 at 15:44
@KelvinLawrence thanks for the tips, I added a sample insert — Dee, Apr 06 '23 at 14:01

Kelvin Lawrence · Accepted Answer · 2023-04-26T00:38:24.323

0

Thanks for providing the sample graph. That really helps. Using this query helps in just seeing the graph visually.

g.V().hasLabel('pat').
  outE().inV().
  inE().outV().
  simplePath().
  path().by(elementMap())

Which, using graph-notebook, produces:

To find the number of codes for each starting patient, we might do this. It builds on the prior query but filters using edge labels.

g.V().hasLabel('pat').as('p').
  out('contracted').
  group().
    by(select('p').id()).
    by(in('includes').count())

which will give us the codes associated with each patient

{'p-0': 3, 'p-2': 0, 'p-1': 1}

However, you may not want this double counting where the code is shared by more than one diagnosis. In that case we can dedup the results.

g.V().hasLabel('pat').as('p').
  out('contracted').
  group().
    by(select('p').id()).
    by(in('includes').dedup().count())

which reduces the count for p-0 to 2 and removes p-2 completely as there are no codes.

{'p-0': 2, 'p-1': 1}

UPDATED

Based on additional discussion in comments, this query can use the groupCount results as a filter.

g.V().hasLabel('pat').as('p').
  outE('contracted').inV().
  where(
    group().
      by(select('p').id()).
      by(in('includes').dedup().count()).
    select(values).unfold().is(2)).
    inE().outV().
    path().by(elementMap())

When rendered visually

edited Apr 26 '23 at 00:38

answered Apr 06 '23 at 19:04

Kelvin Lawrence

14,674
2
16
38

Thanks so much for your help, that makes sense. I'm looking to return the patient vertex (with their diagnoses, codes and edges) based on the highest count. Is that possible? – Dee Apr 12 '23 at 09:20
Could you please add an update to the question showing exactly the result you would like to get back? – Kelvin Lawrence Apr 12 '23 at 15:29
Updated with an example of the format I'd like to return. Thanks! – Dee Apr 14 '23 at 20:21
Sorry I'm a bit confused now - so in your example you no longer want the groupings like `{'p-0': 2, 'p-1': 1}` as in the original question? – Kelvin Lawrence Apr 14 '23 at 20:30
Hi Kelvin, I want to return the vertices and edges of the top patients by code count. I added a diagram to the question, I hope this clarifies it. I'm looking to return the actual graph so I can visually see the top patients and their relationships – Dee Apr 18 '23 at 12:09
Is there anything else I can provide to help explain the problem? – Dee Apr 25 '23 at 12:54
I added to the answer based on discussion in comments. – Kelvin Lawrence Apr 26 '23 at 00:38

Graph Databases: Retrieving the most complex Relationships using Gremlin

1 Answers1