9

I have the following two node types:

c:City {name: 'blah'}
s:Course {title: 'whatever', city: 'New York'}

Looking to create this:

(s)-[:offered_in]->(c)

I'm trying to get all courses that are NOT tied to cities and create the relationship to the city (city gets created if doesn't exist). However, the issue is that my dataset is about 5 million nodes and any query i make times out (unless i do in increment of 10k).

... anybody has any advice?

EDIT:

Here is a query for jobs i'm running now (that has to be done in 10k chunks (out of millions) because it takes few minutes as it is. creates city if doesn't exist):

match (j:Job)
where not has(j.merged) and has(j.city)
WITH j 
LIMIT 10000
MERGE (c:City {name: j.city})
WITH j, c
MERGE (j)-[:in]->(c)
SET j.merged = 1
return count(j)

(for now don't know of a good way to filter out the ones already matched, so trying to do it by tagging it with custom "merged" attribute that i already have an index on)

Diaspar
  • 567
  • 1
  • 5
  • 12
  • Can you share what you are currently trying? – JohnMark13 Sep 04 '14 at 17:22
  • I don't think this can be answered without more context (and matching your question to your update, I assume job == course and in == offered_in). Are you operating on existing data, or is this a bulk import? Could you tell us a bit about your system setup? Instead of 'merged' you could use WHERE NOT (j)-[:in]->(). – JohnMark13 Sep 04 '14 at 19:22
  • Are you seeing the timeout through the browser interface? – Jim Biard Sep 04 '14 at 21:40

1 Answers1

4

500000 is a fair few nodes and on your other question you suggested 90% were without the relationship that you want to create here, so it is going to take a bit of time. Without more knowledge of your system (spec, neo setup, programming environment) and when you are running this (on old data or on insert) this is just a best guess at a tidier solution:

MATCH (j:Job)
WHERE NOT (j)-[:IN]->() AND HAS(j.city)
MERGE (c:City {name: j.city})
MERGE (j)-[:IN]->(c)
return count(j)

Obviously you can add your limits back as required.

JohnMark13
  • 3,709
  • 1
  • 15
  • 26
  • It's 5,000,000 nodes. for some reason the query that i had (and this one) produces tons of duplicate cities... I've given up and just wrote a python script to do this manually in small chunks. will take forever, but seems to hold the uniqueness and does what i need. On an related note, i've tried playing with "CREATE UNIQUE" as well (since this one is supposed to guarantee uniqueness of pieces), but it kept telling me either "unbound pattern" or that it should not be used this way (or something along those lines), so never got that to work. – Diaspar Sep 05 '14 at 20:26
  • 1
    What indexes have you defined and were you using the two merge statements as above? A single merge (`MERGE (j)-[:IN]->(c:City {name: j.city})`) would produce duplicates due to the unmatched unbound pattern. Could you update your question with more information (what you have tried, and what failed) as this should be quite possible! – JohnMark13 Sep 06 '14 at 07:08