I'm currently reading The Practitioner's Guide to Graph Data and am trying to solve the following problem (just for learning purposes). The following is in the context of the books movie dataset, which in this example makes use of a "Tag" vertex, a "Movie" vertex and a "rated" edge which has a rating property of a value 1-5 .
Just for practice, and to extend my understanding of concepts from the book, I would like to get all movies tagged with "comedy" and calculate the mean NPS. To do this, I want to aggregate all positive (+1) and neutral or negative (-1) ratings into a list. Then I wish to divide the sum of these values by the amount of variables in this list (the mean). This is what I attempted:
dev.withSack{[]}{it.clone()}. // create a sack with an empty list that clones when split
V().has('Tag', 'tag_name', 'comedy').
in('topic_tagged').as('film'). // walk to movies tagged as comedy
inE('rated'). // walk to the rated edges
choose(values('rating').is(gte(3.0)),
sack(addAll).by(constant([1.0])),
sack(addAll).by(constant([-1.0]))). // add a value or 1 or -1 to this movies list, depending on the rating
group().
by(select('film').values('movie_title')).
by(project('a', 'b').
by(sack().unfold().sum()). // add all values from the list
by(sack().unfold().count()). // Count the values in the list
math('a / b')).
order(local).
by(values, desc)
This ends up with each movie either being "1.0" or "-1.0".
"Journey of August King The (1995)": "1.0",
"Once Upon a Time... When We Were Colored (1995)": "1.0", ...
In my testing, it seems the values aren't aggregating into the collection how I expected. I've tried various approaches but none of them achieve my expected result.
I am aware that I can achieve this result by adding and subtracting from a sack with an initial value of "0.0", then dividing by the edge count, but I am hoping for a more efficient solution by using a list and avoiding an additional traversal to the edges to get the count.
Is it possible to achieve my result using a list? If so, how?
Edit 1:
The much simpler code below, taken from Kelvins example, will aggregate each rating by simply using the fold step:
dev.V().
has('Tag', 'tag_name', 'comedy').
in('topic_tagged').
project('movie', 'result').
by('movie_title').
by(inE('rated').
choose(values('rating').is(gte(3.0)),
constant(1.0),
constant(-1.0)).
fold()) // replace fold() with mean() to calculate the mean, or do something with the collection
I feel a bit embarrassed that I completely forgot about the fold step, as folding and unfolding are so common. Overthinking, I guess.