0

I'm starting in Neo4j and I'm doing the GraphAcademy intermediate cypher course.

The use case is to find the highest average rating by a user for a Tom Hanks Movie, so I wrote the query like so

match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)
with avg(r.rating) as average, r.rating as rating, m.title as movie
return movie, average, rating order by average desc limit 1

which returns "Catch Me If You Can" 5.0 5.0 ---> wrong answer...

if I instead take the rating out of the query as

match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)
with avg(r.rating) as average,  m.title as movie
return movie, average order by average desc limit 1

it returns "Captain Phillips" 4.2 ---> correct answer

why is the returned value different if I include rating? results are ordered by average in both of them...but if I include rating highest average becomes 5.0 against the 4.2 when rating is omitted

I also tried to compute the average in the return clause but with the same wrong result

match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)

with r.rating as rating, m.title as movie
return movie, avg(rating) as average, rating order by average desc limit 1
Graphileon
  • 5,275
  • 3
  • 17
  • 31
Vincenzo
  • 5,304
  • 5
  • 38
  • 96
  • Does this answer your question? [neo4j query result changed base on return node and properties](https://stackoverflow.com/questions/72336183/neo4j-query-result-changed-base-on-return-node-and-properties) – Charchit Kapoor Aug 31 '22 at 09:10

1 Answers1

1

avg() is a grouping function, and by including r.rating AS rating in the WITH statement, it will calculate the avg() by rating. So it groups all the cases where rating is the same and calculates the average.

When you remove the LIMIT 1 in your first query, you will see that you will get a number of rows with the same value for rating and average.

Graphileon
  • 5,275
  • 3
  • 17
  • 31
  • yes indeed I get all results with 5.0 value on both rating and average and I thought that avg() could be taking in also rating values..which it also are r.rating values. I dough find it weird.. what if I do want do see average rating and actual rating? Should I also compute the actual rating instead of just reading the relationship value? – Vincenzo Aug 31 '22 at 09:48
  • Can you be more specific about the desired result? An actual rating relates to a specific rating, while an average always relates to a group of ratings. I suggest you add the desired result to the original question. – Graphileon Aug 31 '22 at 10:01
  • sure. I' d like to see the movie rating ( which I just remembered is actually m.imdbRating ) property, the average rating ( which is calulated averaging out all r.ratings ) and the movie title, and the highest rating given to the movie.. So I'd answer the use case "what is the highest rated movie" ..and display also actual rating for the movie.. Now.. I was trying to get the averaged out r.rating ( which should correspond to the movie rating) with `avg(r.rating) as average`, and the highest rating received with `r.rating as rating order by r.rating desc limit 1` – Vincenzo Aug 31 '22 at 10:13
  • I don't want to use m.imdbRating which shows not an average of received ratings but rather some other value calculated differently as it shows higher value ( 7.9) that averaged out received ratings (4.2) .. – Vincenzo Aug 31 '22 at 10:18
  • Ok I understood what you meant with `So it groups all the cases where rating is the same and calculates the average.`. I see that if instead omit the `r.rating as rating` it actually gathers all values from `r.rating` and uses it in the aggregation functions.. I can either pass just the connection `r` in the `with` clause and then in the `return` clause define all the functions I need like: `'avg: ' + avg(r.rating) as average, 'max-rat: ' + max(r.rating) as maxRat, 'rat: ' + r.rating as rating, 'title: ' + m.title as movie` or do it directly in the `with` clause and just return the aliases – Vincenzo Aug 31 '22 at 14:34