0

I have a large DB and when I try to execute my query:

https://textuploader.com/1cduz

It takes more than 4 days to finish.

Can you help me to build a new one that may be using APOC?

Rajendra Kadam
  • 4,004
  • 1
  • 10
  • 24
Alex D.
  • 41
  • 6
  • Can you please share How big is the csv file? How many records are there? – Rajendra Kadam May 11 '20 at 18:55
  • It is unusual to have all fields be optional. Is `COALESCE` really needed on every field? – cybersam May 11 '20 at 21:25
  • @Raj there are 1010252 records, the file size is 1,2 gb. – Alex D. May 12 '20 at 07:41
  • @cybersam i think yes because if i don't set it the message of null value is shown – Alex D. May 12 '20 at 07:44
  • OK, but surely not *every* field can have missing values. Also, you should evaluate your CSV data to identify why some fields are missing, the correct way to handle each such field (e.g., skip that entire record), and whether there are issues in how the file is generated (e.g., not all records have the expected number of fields). In other words, it is important to properly cleanse your incoming data so that your DB has reliable and consistent data. Simply replacing all fields that have missing data with some arbitrary value risks producing an unreliable DB. – cybersam May 12 '20 at 17:54

1 Answers1

0

I think you are not using Constraints(or Indexes).

From your query, it looks like workId is a unique key. You should create a unique constraint on it for faster search in the MERGE clause.

CREATE CONSTRAINT unique_work_id
ON (n:Work)
ASSERT n.workId IS UNIQUE

You can also check a few other tips on faster data load in this answer.

Rajendra Kadam
  • 4,004
  • 1
  • 10
  • 24
  • it's taking too long unfortunately – Alex D. May 12 '20 at 08:30
  • 1M records is not a big number, it should not take much time. Can you share what steps did you try and the updated query? – Rajendra Kadam May 12 '20 at 13:15
  • Also if you can profile the query with few records and share the query plan here – Rajendra Kadam May 12 '20 at 13:16
  • i used an example file with just a few records, i just used your Constraint query, applied 2 :parameters query( for COALESCE) and then used the same query i alraedy posted above. Here the link with screenshot from Profiling: https://drive.google.com/open?id=1T0jvkC6bd_3w4h7POo3Ou_c_VLe4Angs – Alex D. May 12 '20 at 16:58
  • You can produce a single PNG file for the plan by just clicking the Export icon (the one that looks like a down-arrow pointing at an inbox) and selecting the "Export PNG" option. It would also be very helpful to see the plan *without* the constraint. – cybersam May 12 '20 at 18:09
  • ok thanks @cybersam, i did it and the link of png is the same drive.google above – Alex D. May 12 '20 at 22:14
  • In your profile output, it shows nodes are searched by label (`NodeByLabelScan`), Constraint/Index is not used. Can you check if the uniqueness index is created on the right property(CASE SENSITIVE)? Use `CALL db.indexes()`, for listing all the indexes. – Rajendra Kadam May 13 '20 at 04:16
  • `NodeUniqueIndexSeek` is expected instead of `NodeByLabelScan` – Rajendra Kadam May 13 '20 at 04:18
  • i updated the above folder with the profile after created costraint – Alex D. May 13 '20 at 09:38