2

I have a dataset of 60000 items in mysql and i'm trying to insert it into neo4j. The insertion is taking place but it's taking a long time( approx. 10-15 per 3 sec). Is there any way i can speed it up? also is there any way i could give something such as unique key in neo4j so duplication indexes won't get indexed? I'm new to neo4j.

I'm using neo4j 1.8 with PHP Everyman driver.

hablema
  • 540
  • 1
  • 5
  • 17
  • Can you be more specific about this operation? As far as I know, you can't just make Neo4j 'parse-in' MySQL files; that means, you have to extract data from MySQL first, and store it in Neo4j, second: the performance depends on both these steps. Do you use BatchInserter? – raina77ow Dec 03 '12 at 15:55
  • The data was extracted from mysql using a select query and i don't use a batchinserter for this. As i'm new to this, i'm following the example from jadell ( https://github.com/jadell/neo4jphp/blob/master/examples/bacon.php ). Is there a better way to do it? – hablema Dec 03 '12 at 16:01
  • the php code you referred to does a single http operation for each node creation and property setting, this should be at least rest-batched, better done in cypher (also rest-batched), or done using one of the importer tools – Michael Hunger Dec 05 '12 at 01:45

2 Answers2

2

There is a nice presentation from Max De Marzi, about ETL into Neo4j.

See: http://www.slideshare.net/maxdemarzi/etl-into-neo4j

It depends which language you want to use, lots of options from java embedded via jruby and remotely via ruby, php, python.

You would want to batch your requests in appropriately sized transactions (e.g. 10k items per tx).

It is possible to import CSV files directly into a database file using my the batch-importer or via the BATCH REST API of the Neo4j Server.

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Can you give me an example where data is transferred from mysql to neo4j in php ? It would really help me. – hablema Dec 03 '12 at 17:51
  • Is Neo4j always slow during insertion. The maximum i could achieve was 100 nodes per sec with batch insertion. Is there any there way to insert, i have around 2M nodes to make from my MySQL db and then relate them. – hablema Dec 04 '12 at 07:44
  • this seems not to be neo4j directly related. i have never such slow insertion times, and i did a lot of psql->neo4j importing. i guess you are querying the data from the mysql on the go - try to first query the data and insert them into a graphML or geoff xml format. then simply use some in-build function, such as gremlins g.loadGraphML('mysqlexport.xml') - http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-load-a-sample-graph – ulkas Dec 04 '12 at 08:59
  • neo can import up to 1M nodes per second :) using something like the mentioned batch-inserter, it just needs csv files which you can easily generate from mysql even using php. – Michael Hunger Dec 05 '12 at 01:43
1

As mentioned above, the preferred option is the batch importer.

If you need to go through the PHP client, here I've put up an example that uses the REST batch api: http://phpfiddle.org/main/code/mu3-sgk

You can test the rows/batch what works best for your system. For my laptop it's 750, for my test-server it's 1250. The json_decode that happens is heavy on the CPU.

Roelb
  • 46
  • 2