1

I have a cypher query that returns list of users that are recommended to a user to follow, but am getting duplicate results when the cypher is executed.

Here is the cypher query:

MATCH (user:User { id: $userId })
MATCH (user)-[interestRel:INTERESTED_IN]->()<-[:INTERESTED_IN]-(recommendedUsers)
WITH DISTINCT recommendedUsers, interestRel, user
WHERE NOT recommendedUsers = user AND
    NOT exists((user)-[:FOLLOWING]->(recommendedUsers))
RETURN recommendedUsers {
    .id,
    following: false
} ORDER BY interestRel.interestLevel DESC SKIP $skip LIMIT $limit

I understand there will be duplicates because a user might be INTERESTED_IN multiple nodes, so when the INTERESTED_IN relationship is traversed, for each node that has a INTERESTED_IN relationship, duplicate users will be returned. But am returning DISTINCT users, so I don't understand why duplicate users are still returned.

I noticed that when the INTERESTED_IN relationship is bound to a variable (interestRel) which is used in the query, that's when duplicate results are returned.

How do I get rid of the duplicates and still reference the INTERESTED_IN (interestRel) relationship?

Emmanuel
  • 47
  • 2
  • 6

1 Answers1

1

DISTINCT in your case filters out distinct combinations of recommendedUsers,interestRel, user , not just distinct users. So when you have a recommendedUser that has two common interests with user, it is logical that he shows up twice.

It seems that you are interested to return the recommendedUsers that are interested in something in which user is highly interested in.

That said, I would write the query as follows:

MATCH (user:User { id: $userId })-[interestRel:INTERESTED_IN]->()<-[:INTERESTED_IN]-(recommendedUser)
WITH user,recommendedUser,
     MAX(interestRel.interestLevel) AS interestLevel
WHERE NOT recommendedUser = user AND
      NOT exists((user)-[:FOLLOWING]->(recommendedUser))
RETURN interestLevel,
       recommendedUser {
                 .id,
                 following: false
       } ORDER BY interestLevel DESC SKIP $skip LIMIT $limit

NOTE: I use the singular version, so recommendedUser b/c I think it makes the query easier to understandab.

Graphileon
  • 5,275
  • 3
  • 17
  • 31
  • Thank you, but using `MAX(interestRel.interestLevel)` will return the highest interest of the user and `ORDER` with that, which makes the `ORDER`ing thesame for all recommendedUser. I think what I need is `sum(interestRel.interestLevel)` which will `sum` the `interestLevel` for all the path where a `recommendedUser` is found. This makes the `interestLevel` different for each `recommendedUser`. – Emmanuel Nov 07 '21 at 18:46
  • You can use SUM() too indeed, depending on what you need.you could end up with 20 matched of level 1 ,compared to 4 matches of level 5 … – Graphileon Nov 07 '21 at 20:55