Parse a big file and populate a Neo4j database

Question

I am working on a Ruby on Rails project that will read and parse somewhat big text file (around 100k lines) and build Neo4j nodes (I am using Neography) with that data. This is the Neo4j related fraction of the code I wrote:

    d= Neography::Rest.new.execute_query("MATCH (n:`Label`) WHERE (n.`name`='#{id}') RETURN n")
    d= Neography::Node.load(d, @neo)
    p= Neography::Rest.new.create_node("name" => "#{id}") 
    Neography::Rest.new.add_label(p, "LabelSample") 
    d=Neography::Rest.new.get_node(d)
    Neography::Rest.new.create_relationship("belongs_to", p, d)

so, what I want to do is: a search in the already populated db for the node with the same "name" field as the parsed data, create a new node for this data and finally create a relationship between the two of them. Obiously this code simply takes way too much time to be executed. So I tried with Neography's batch, but I ran into some issues.

    p = Neography::Rest.new.batch [:create_node, {"name" => "#{id}"}]

gave me a "undefined method `split' for nil:NilClass" in

id["self"].split('/').last

    d=Neography::Rest.new.batch [:get_node, d]

gives me a Neography::UnknownBatchOptionException for get_node

I am not even sure this will save me enough time either.

I also tried different ways to do this, using Batch Import for example, but I couldn't find a way to get the already created node I need from the db. As you can see I'm kinda new to this so any help will be appreciated. Thanks in advance.

score 1 · Accepted Answer · edited Feb 04 '15 at 17:53

1

You might be able to do this with pure cypher (or neography generated cypher). Something like this perhaps:

MATCH (n:Label) WHERE n.name={id}
WITH n
CREATE (p:LabelSample {name: n.name})-[:belongs_to]->n

Not that I'm using CREATE, but if you don't want to create duplicate LabelSample nodes you could do:

MATCH (n:Label) WHERE n.name={id}
WITH n
MERGE (p:LabelSample {name: n.name})
CREATE p-[:belongs_to]->n

Note that I'm using params, which are generally recommended for performance (though this is just one query, so it's not as big of a deal)

edited Feb 04 '15 at 17:53

subvertallchris

5,282
2
25
43

answered Feb 04 '15 at 14:06

Brian Underwood

10,746
1
22
34

Thanks Brian, I gave it a try and the performance is noticeably better, but it still took around 25 minutes (down from "it has been 80 minutes, I give up" from my former code) to process my 80k rows long sample dataset. Is there a way to improve anymore the performance? – AGarofoli Feb 04 '15 at 15:37
@AGarofoli You'll get a huge performance boost from params if you're doing that same query over and over again, check out http://neo4j.com/docs/stable/cypher-parameters.html and replace the `{name: n.name}` with a param. You can also consider doing chunks of your updates in transactions that you close every X number of items. It'll give you another little boost, though I'm not sure how you'd do that with Neography. – subvertallchris Feb 04 '15 at 17:53

Parse a big file and populate a Neo4j database

1 Answers1