Neo4j LOAD CSV with Custom IDs

Question

I'm using Neo4j 2.2.3 and have tried to import a CSV file with Cypher's LOAD CSV that looks like this:

PRODUCT_ID,PRODUCT_DESC,PRODUCT_TYPE
99,"A","X"
999,"B","X"
9999,"C","Y"
99999,"D","Y"

However, I'm having difficulty with the custom ID. Neo4j does not import them.

The PRODUCT_ID column is a unique ID that comes from a different system (over which I have no control) but, suffice to say, it's not a row number. The PRODUCT_ID is required to match data from other files.

When I look at the file I see all fields:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
RETURN row

The same is true when I pick any of the columns except PRODUCT_ID:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
RETURN row.PRODUCT_DESC

When I use RETURN row.PRODUCT_ID instead, I get a table of nulls.

Similarly, this does not work:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
CREATE (p:Product { id: toInt(row.PRODUCT_ID),
                    name: row.PRODUCT_DESC,
                    type: row.PRODUCT_TYPE })

I get the products with descriptions and types, but the IDs are lost. I need the IDs to match this CSV file with other files, but I'm stuck and would appreciate some help. I haven't seen any restrictions regarding ID columns in the documentation.

When I move the PRODUCT_ID column away from the first position, it works with the caveat that the then-first column is ignored. So, if I make PRODUCT_ID the last column, PRODUCT_DESC is null. Is it necessary to have row numbers as the first column in a CSV? Seems odd.

Any suggestions?

score 1 · Answer 1 · answered Aug 05 '15 at 13:59

1

Could it maybe be this issue?

http://www.markhneedham.com/blog/2014/10/18/neo4j-load-csv-the-sneaky-null-character/

Basically he had NULL characters in his CSV which was causing the data to be read incorrectly.

answered Aug 05 '15 at 13:59

Brian Underwood

10,746
1
22
34

Nope. It does not even work with the aforementioned four lines of mock data. I tried converting line endings but that did not do anything either. – Ian Aug 06 '15 at 05:14

score 1 · Answer 2 · answered Aug 06 '15 at 12:33

Not sure what the issue is here. I copied the data to a junk.txt file and was able to import without issue.

LOAD CSV WITH HEADERS FROM 'file://junk.txt' as csvLine FIELDTERMINATOR ',' return csvLine;

Note: Also WITHOUT the FIELDTERMINATOR.

The original question asked about the product id..

LOAD CSV WITH HEADERS FROM 'file://junk.txt' as csvLine FIELDTERMINATOR ',' return csvLine.PRODUCT_ID;

score 1 · Answer 3 · answered Aug 06 '15 at 20:23

1

It is probably the Byte Order Mark (BOM) the two bytes at the beginning of the file.

If you remove that (which is invisible in text) it should work.

answered Aug 06 '15 at 20:23

Michael Hunger

41,339
3
57
80

Yes, that's the one. I was reading the [manual](http://neo4j.com/docs/stable/query-load-csv.html#csv-file-format) and checked this [example](http://neo4j.com/docs/stable/cypherdoc-importing-csv-files-with-cypher.html) where nothing is said about the BOM. However, the [dev guide](http://neo4j.com/developer/guide-import-csv/) has it listed as a common pitfall. The manual could have been a bit clearer on this, methinks. – Ian Aug 07 '15 at 05:00

Brian Underwood · Answer 4 · 2015-08-05T13:55:36.550

I'm not quite sure what the problem is, but I have some suggestions to try. You could surround the field names with backticks like this:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
CREATE (p:Product { id: toInt(row.`PRODUCT_ID`),
                    name: row.`PRODUCT_DESC`,
                    type: row.`PRODUCT_TYPE` })

You could also try surrounding the headers with quotes like this:

"PRODUCT_ID","PRODUCT_DESC","PRODUCT_TYPE"

Of course you could remove the header and go by indexes, though that's a bit lame. One way to make it a bit nicer is to do:

LOAD CSV FROM 'file:///path/to/product.csv' AS row
WITH
  toInt(row[0]) AS product_id,
  row[1] AS product_desc,
  row[2] AS product_type
CREATE // etc...

The problem is that the first column is completely ignored. After running your example with the backticks I get the same as what I had before: `Added 4 labels, created 4 nodes, set 8 properties, statement executed in 2901 ms.`. The properties are `name` and `type`. Surrounding the headers with double quotes doesn't help either. — Ian, Aug 05 '15 at 12:48

score 0 · Answer 5 · answered Aug 06 '15 at 05:43

0

So, what worked in the end? I got the files from a Linux system, tried to import them in Windows with the line endings properly modified, but that did not work. Instead, I had to save the files as ANSI instead of UTF-8. I would appreciate it if anyone can tell my what that matters because the doc clearly states that files should be UTF-8 not ANSI, not even on Windows. Anyway, it now works.

answered Aug 06 '15 at 05:43

Ian

1,294
3
17
39

Interesting... Do you mean ASCII, by the way? I wonder if there were UTF-8 characters that Neo4j didn't expect and converting to ASCII removed them... – Brian Underwood Aug 06 '15 at 11:40

Neo4j LOAD CSV with Custom IDs

5 Answers5