4

I have a simple model of a chess tournament. It has 5 players playing each other. The graph looks like this:

enter image description here

The graph is generally fine, but upon further inspection, you can see that both sets
Guy1 vs Guy2,
and
Guy4 vs Guy5
have a redundant relationship each.

The problem is obviously in the data, where there is a extraneous complementary row for each of these matches (so in a sense this is a data quality issue in the underlying csv):

enter image description here

I could clean these rows by hand, but the real dataset has millions of rows. So I'm wondering how I could remove these relationships in either of 2 ways, using CQL:

1) Don't read in the extra relationship in the first place

2) Go ahead and create the extra relationship, but then remove it later.

Thanks in advance for any advice on this.

The code I'm using is this:

/ Here, we load and create nodes

LOAD CSV WITH HEADERS FROM
'file:///.../chess_nodes.csv' AS line
WITH line
MERGE (p:Player {
  player_id: line.player_id
})

ON CREATE SET p.name = line.name
ON MATCH SET p.name = line.name

ON CREATE SET p.residence = line.residence
ON MATCH SET p.residence = line.residence

// Here create the edges

LOAD CSV WITH HEADERS FROM
'file:///.../chess_edges.csv' AS line
WITH line
MATCH (p1:Player {player_id: line.player1_id})
WITH p1, line
OPTIONAL MATCH (p2:Player {player_id: line.player2_id})
WITH p1, p2, line
MERGE (p1)-[:VERSUS]->(p2)
Monica Heddneck
  • 2,973
  • 10
  • 55
  • 89
  • This is not directly related to your issue, but these queries have a lot of extraneous clauses. 1. The `ON CREATE blah`/`ON MATCH blah` pairs can be replaced by just a single `blah`. 2. None of the `WITH` clauses are serving any purpose, and can be removed. – cybersam May 14 '16 at 23:45
  • For #1, what is the preferred syntax? – Monica Heddneck May 15 '16 at 00:38
  • Since you want to perform exactly the same `SET` operations, no matter if the `MERGE` created a new node or matched an existing node, you should not use `ON MATCH` and `ON create` at all. Just perform your 2 different `SET` operations directly: `SET p.name = line.name, p.residence = line.residence`. – cybersam May 15 '16 at 19:23
  • Ahhhh...yes that makes sense. `ON CREATE / ON MATCH` basically just means `SET`. Thanks!! – Monica Heddneck May 15 '16 at 20:10
  • but wait. I used `ON MATCH` and `ON CREATE` as a solution since my data has some missing values and `MERGE` is lousy with missing values. – Monica Heddneck May 16 '16 at 02:41

4 Answers4

8

It is obvious that you don't need this extra relationship as it doesn't add any value nor weight to the graph.

There is something that few people are aware of, despite being in the documentation.

MERGE can be used on undirected relationships, neo4j will pick one direction for you (as realtionships MUST be directed in the graph).

Documentation reference : http://neo4j.com/docs/stable/query-merge.html#merge-merge-on-an-undirected-relationship

An example with the following statement, if you run it for the first time :

MATCH (a:User {name:'A'}), (b:User {name:'B'}) 
MERGE (a)-[:VERSUS]-(b)

It will create the relationship as it doesn't exist. However if you run it a second time, nothing will be changed nor created.

I guess it would solve your problem as you will not have to worry about cleaning the data in upfront nor run scripts afterwards for cleaning your graph.

Christophe Willemsen
  • 19,399
  • 2
  • 29
  • 36
  • Not every player ends up playing each other, though (for example, if you look a the graph you can see Guy3 and Guy5 don't actually play). Would this code snipped create that nonexistent match? – Monica Heddneck May 14 '16 at 22:04
  • Well if there is no row in your edges.csv file that represent a relationship between Guy3 and Guy5, no it will not be created – Christophe Willemsen May 14 '16 at 22:08
  • Ah, I see what you mean. Something like `MATCH (p1:Player {player_id: line.player1_id}), (p2:Player {player_id: line.player2_id}) MERGE (p1)-[:VERSUS]-(p2)`. I still get this warning that I've seen before: `This query builds a cartesian product between disconnected patterns.` – Monica Heddneck May 14 '16 at 22:15
  • 1
    If you have an index on :Player(player_id), don't worry about the warning – Christophe Willemsen May 14 '16 at 23:03
2

I'd suggest creating a "match" node like so

(x:Player)-[:MATCH]->(m:Match)<-[:MATCH]-(y:Player) 

to enable tracking details about the match separate from the players.

If you need to track player matchups distinct from the matches themselves, then

(x:Player)-[:HAS_PLAYED]->(pair:HasPlayed)<-[:HAS_PLAYED]-(y:Player)

would do the trick.

Tim Kuehn
  • 3,201
  • 1
  • 17
  • 23
  • 1
    I was planning on using the edges to hold information about the matches...were your recommending a change to the schema? I'd prefer to keep it as it is, and just remove the redundancies. – Monica Heddneck May 14 '16 at 22:04
  • Tim - While it might be a good suggestion to have a Match node (or at least make for an interesting discussion), suggesting this doesn't answer the core question of removing redundant relationships. – David Makogon May 15 '16 at 02:58
  • With all due respect, the core question isn't the redundant relationships, its how the schema is organized. – Tim Kuehn May 15 '16 at 18:41
  • The problem could be solved in many ways, I appreciate both of your time and input immensely as I ramp up my Neo4j understanding. I've already learned so much from SO and all input is greatly appreciated. – Monica Heddneck May 15 '16 at 19:06
2

If the schema has to stay as-is and the only requirement is to remove redundant relationships, then

MATCH (p1:Player)-[r1:VERSUS]->(p2:Player)-[r2:VERSUS]->(p1)
DELETE r2

should do the trick. This finds all p1, p2 nodes with bi-directional VERSUS relationships and removes one of them.

Tim Kuehn
  • 3,201
  • 1
  • 17
  • 23
  • 3
    Should be MATCH (p1:Player)-[r1:VERSUS]->(p2:Player)-[r2:VERSUS]->(p1) WHERE id(p1) < id(p2) DELETE r2 – nmervaillie Nov 24 '17 at 12:36
  • Running your code deleted both directions. I should have read the comment before trying it, which seems to fix it. But from now on I'll just avoid creating them. – Aaron Bramson Oct 18 '18 at 02:48
  • This should be the accepted answer for option #2 in the OP's question. Works great in my case too! – prrao Sep 27 '19 at 16:45
1

You need to use UNWIND to do the trick.

MATCH (p1:Player)-[r:VERSUS]-(p2:Player)
WITH p1,p2,collect(r) AS rels
UNWIND tail(rels) as rel
DELETE rel;

THe previous code will find the direct connections of type VERSUS between p1 and p2 using match (note that this is not directed). Then will get the collection of relationships and finally the last of those relations, which is deleted. Of course you can add a check to see whether the length of the collection is 2.

gfhuertac
  • 356
  • 3
  • 7