3

I'm using Neo4j 2.2.3 and have tried to import a CSV file with Cypher's LOAD CSV that looks like this:

PRODUCT_ID,PRODUCT_DESC,PRODUCT_TYPE
99,"A","X"
999,"B","X"
9999,"C","Y"
99999,"D","Y"

However, I'm having difficulty with the custom ID. Neo4j does not import them.

The PRODUCT_ID column is a unique ID that comes from a different system (over which I have no control) but, suffice to say, it's not a row number. The PRODUCT_ID is required to match data from other files.

When I look at the file I see all fields:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
RETURN row

The same is true when I pick any of the columns except PRODUCT_ID:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
RETURN row.PRODUCT_DESC

When I use RETURN row.PRODUCT_ID instead, I get a table of nulls.

Similarly, this does not work:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
CREATE (p:Product { id: toInt(row.PRODUCT_ID),
                    name: row.PRODUCT_DESC,
                    type: row.PRODUCT_TYPE })

I get the products with descriptions and types, but the IDs are lost. I need the IDs to match this CSV file with other files, but I'm stuck and would appreciate some help. I haven't seen any restrictions regarding ID columns in the documentation.

When I move the PRODUCT_ID column away from the first position, it works with the caveat that the then-first column is ignored. So, if I make PRODUCT_ID the last column, PRODUCT_DESC is null. Is it necessary to have row numbers as the first column in a CSV? Seems odd.

Any suggestions?

Christophe Willemsen
  • 19,399
  • 2
  • 29
  • 36
Ian
  • 1,294
  • 3
  • 17
  • 39

5 Answers5

1

Could it maybe be this issue?

http://www.markhneedham.com/blog/2014/10/18/neo4j-load-csv-the-sneaky-null-character/

Basically he had NULL characters in his CSV which was causing the data to be read incorrectly.

Brian Underwood
  • 10,746
  • 1
  • 22
  • 34
  • Nope. It does not even work with the aforementioned four lines of mock data. I tried converting line endings but that did not do anything either. – Ian Aug 06 '15 at 05:14
1

Not sure what the issue is here. I copied the data to a junk.txt file and was able to import without issue.

LOAD CSV WITH HEADERS FROM 'file://junk.txt' as csvLine FIELDTERMINATOR ',' return csvLine;

Note: Also WITHOUT the FIELDTERMINATOR.

The original question asked about the product id..

LOAD CSV WITH HEADERS FROM 'file://junk.txt' as csvLine FIELDTERMINATOR ',' return csvLine.PRODUCT_ID;

David Bigelow
  • 111
  • 1
  • 5
1

It is probably the Byte Order Mark (BOM) the two bytes at the beginning of the file.

If you remove that (which is invisible in text) it should work.

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Yes, that's the one. I was reading the [manual](http://neo4j.com/docs/stable/query-load-csv.html#csv-file-format) and checked this [example](http://neo4j.com/docs/stable/cypherdoc-importing-csv-files-with-cypher.html) where nothing is said about the BOM. However, the [dev guide](http://neo4j.com/developer/guide-import-csv/) has it listed as a common pitfall. The manual could have been a bit clearer on this, methinks. – Ian Aug 07 '15 at 05:00
0

I'm not quite sure what the problem is, but I have some suggestions to try. You could surround the field names with backticks like this:

LOAD CSV WITH HEADERS FROM 'file:///path/to/product.csv' AS row
CREATE (p:Product { id: toInt(row.`PRODUCT_ID`),
                    name: row.`PRODUCT_DESC`,
                    type: row.`PRODUCT_TYPE` })

You could also try surrounding the headers with quotes like this:

"PRODUCT_ID","PRODUCT_DESC","PRODUCT_TYPE"

Of course you could remove the header and go by indexes, though that's a bit lame. One way to make it a bit nicer is to do:

LOAD CSV FROM 'file:///path/to/product.csv' AS row
WITH
  toInt(row[0]) AS product_id,
  row[1] AS product_desc,
  row[2] AS product_type
CREATE // etc...
Brian Underwood
  • 10,746
  • 1
  • 22
  • 34
  • The problem is that the first column is completely ignored. After running your example with the backticks I get the same as what I had before: `Added 4 labels, created 4 nodes, set 8 properties, statement executed in 2901 ms.`. The properties are `name` and `type`. Surrounding the headers with double quotes doesn't help either. – Ian Aug 05 '15 at 12:48
0

So, what worked in the end? I got the files from a Linux system, tried to import them in Windows with the line endings properly modified, but that did not work. Instead, I had to save the files as ANSI instead of UTF-8. I would appreciate it if anyone can tell my what that matters because the doc clearly states that files should be UTF-8 not ANSI, not even on Windows. Anyway, it now works.

Ian
  • 1,294
  • 3
  • 17
  • 39
  • Interesting... Do you mean ASCII, by the way? I wonder if there were UTF-8 characters that Neo4j didn't expect and converting to ASCII removed them... – Brian Underwood Aug 06 '15 at 11:40