2

I have a non-unique node (:Neighborhood) that uniquely appears [:IN] a (:City) node. I would like to create a new neighborhood node and establish its relationship ONLY if that neighborhood node does not exist in that city. There can be multiple neighborhoods that have the same name, but each neighborhood must appear uniquely appear in the property city.

Following the advice from the Gil's answer here: Return node if relationship is not present, how can I do something like:

MATCH a WHERE NOT (a:Neighborhood {name : line.Neighborhood})-[r:IN]->(c:City {name : line.City})
ON MATCH SET (a)-[r]-(c)

So then it would only create a new neighborhood node if it doesn't already exist in the city.

**UPDATE:**I upgraded and profiled it and still can't take advantage of any optimizations...

PROFILE LOAD CSV WITH HEADERS FROM "file://THEFILE" as line
WITH line LIMIT 0
MATCH (c:City { name : line.City})
MERGE (n:Neighborhood {name : toInt(line.Neighborhood)})-[:IN]->(c)

;


+--------------+------+--------+---------------------------+------------------------------+
|     Operator | Rows | DbHits |               Identifiers |                        Other |
+--------------+------+--------+---------------------------+------------------------------+
|  EmptyResult |    0 |      0 |                           |                              |
|  UpdateGraph |    5 |      16 | anon[340], b, neighborhood, line |                 MergePattern |
|  SchemaIndex |    5 |      10 |                   b, line | line.City; :City(name) |
| ColumnFilter |    5 |      0 |                      line |            keep columns line |
|       Filter |    5 |      0 |           anon[216], line |                    anon[216] |
|      Extract |    5 |      0 |           anon[216], line |                    anon[216] |
|        Slice |    5 |      0 |                      line |                 {  AUTOINT0} |
|      LoadCSV |    5 |      0 |                      line |                              |
+--------------+------+--------+---------------------------+------------------------------+
Community
  • 1
  • 1
NumenorForLife
  • 1,736
  • 8
  • 27
  • 55
  • What does it look like if you change the `LIMIT` to something like 5? How many rows are in your CSV? – Brian Underwood May 15 '15 at 15:13
  • @BrianUnderwood That's what it looks like when I do it with a limit of 5 – NumenorForLife May 15 '15 at 21:56
  • 1
    Another stab: Is there an index on `Neighborhood.name`? – Brian Underwood May 15 '15 at 23:02
  • The index on neighborhood name shouldn't be unique, because there can be the same neighborhood name in multiple cities, right? – NumenorForLife May 25 '15 at 19:19
  • 1
    If the same neighborhood name can be in multiple cities then yes, it shouldn't be unique. In Neo4j unique indexes are called constraints, so I would suggest just a plain index (sorry for the delay, I was on holiday until today) – Brian Underwood Jun 01 '15 at 12:31
  • @BrianUnderwood I figured that out, and created a new question on optimizing it here: http://stackoverflow.com/questions/30444845/how-can-i-create-a-constraint-on-unique-relationships-in-neo4j – NumenorForLife Jun 01 '15 at 12:48

1 Answers1

2

I think you could simply use MERGE for this:

MATCH (c:City {name: line.City})
MERGE c<-[:IN]-(a:Neighborhood {name : line.Neighborhood})

If you haven't already imported all of the cities, you can create those with MERGE:

MATCH (c:City {name: line.City})
MERGE c<-[:IN]-(a:Neighborhood {name : line.Neighborhood})

But beware of the Eager operator:

http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

In short: You should run your LOAD CSV (I assume that's what you're doing here) twice, once to load the cities and once to load the neighborhoods.

Brian Underwood
  • 10,746
  • 1
  • 22
  • 34
  • On a side note: Neo4j 2.2.1 contains a lot of improvements to prevent eagerness. – Stefan Armbruster Apr 26 '15 at 08:58
  • Brian there's no difference between the two statements you wrote. I'm unfortunately getting an eager when I do this because of that match statement. – NumenorForLife Apr 26 '15 at 14:56
  • Is it possible for you to try Neo4j 2.2.1 as Stefan suggested? That could potentially help – Brian Underwood Apr 26 '15 at 20:50
  • I could make another post about this, but perhaps you can point me in the right direction. The Deployment Upgrading page (http://neo4j.com/docs/2.2.1/deployment-upgrading.html) doesn't outline how to upgrade from what I'm using (2.2.0) to 2.2.1. Do you have any recommendations of how to do this? – NumenorForLife Apr 26 '15 at 23:45
  • 1
    Set allow_store_upgrade to true in neo4j.properties. – Stefan Armbruster Apr 27 '15 at 07:21
  • That's the one, though I'm a bit surprised that that's required for a patch version upgrade, though – Brian Underwood Apr 27 '15 at 08:26
  • @StefanArmbruster Should I keep this allow_store_upgrade to true from this point onwards? Do I need to do anything else besides setting this to true? After setting this to true, and restarting bin/neo4j, I am still at version 2.2.0 – NumenorForLife Apr 27 '15 at 16:27
  • I am still seeing an eager pop up after setting allow_store_upgrade to true – NumenorForLife Apr 27 '15 at 20:03
  • Ah, one step back: You need to download 2.2.1 and (with no servers started up) copy the `data/graph.db` folder from the 2.2.0 instance to the 2.2.1 instance. Then make sure that `allow_store_upgrade` is set to `true` in the 2.2.1 instance before starting it up. – Brian Underwood Apr 27 '15 at 21:16
  • @BrianUnderwood I updated the allow_store_upgrade, and followed your recommendation. I don't get an eager as you see above... – NumenorForLife May 15 '15 at 00:07
  • 1
    Sorry, I'm a bit lost since it's been a while. I wouldn't expect you to get an Eager from the query that you put in your edit even before Neo4j 2.2.1. Is your load still slow? – Brian Underwood May 15 '15 at 09:01
  • @BrianUnderwood it still is. I ran it for over 15-20 minutes, and it still wasn't completing. – NumenorForLife May 15 '15 at 11:52