I have a simple model of a chess tournament. It has 5 players playing each other. The graph looks like this:
The graph is generally fine, but upon further inspection, you can see that both sets
Guy1 vs Guy2,
and
Guy4 vs Guy5
have a redundant relationship each.
The problem is obviously in the data, where there is a extraneous complementary row for each of these matches (so in a sense this is a data quality issue in the underlying csv):
I could clean these rows by hand, but the real dataset has millions of rows. So I'm wondering how I could remove these relationships in either of 2 ways, using CQL:
1) Don't read in the extra relationship in the first place
2) Go ahead and create the extra relationship, but then remove it later.
Thanks in advance for any advice on this.
The code I'm using is this:
/ Here, we load and create nodes
LOAD CSV WITH HEADERS FROM
'file:///.../chess_nodes.csv' AS line
WITH line
MERGE (p:Player {
player_id: line.player_id
})
ON CREATE SET p.name = line.name
ON MATCH SET p.name = line.name
ON CREATE SET p.residence = line.residence
ON MATCH SET p.residence = line.residence
// Here create the edges
LOAD CSV WITH HEADERS FROM
'file:///.../chess_edges.csv' AS line
WITH line
MATCH (p1:Player {player_id: line.player1_id})
WITH p1, line
OPTIONAL MATCH (p2:Player {player_id: line.player2_id})
WITH p1, p2, line
MERGE (p1)-[:VERSUS]->(p2)