0

I am trying to import huge .csv file into the database. It is Drupal database, I am importing products into the node, content_type_ and uc_products tables. But this is not so relevant as due to the performance optimization I stopped using Drupal functions node_load and/or node_save and I insert/update records directly via SQL commands.

Scenario:

  1. Read whole csv file from local server (approx. 40MB) into the variable (to prevent accessing the file per each record)
  2. Parse it to array per line
  3. Parse every line to fields

  4. Check if the record in database exists (1 SQL select, compare particular field)

  5. If exists, update it (1 SQL update)
  6. If does not exists, insert it (3 inserts, 1 update)

    6.1. If record has image code and it is different from the one in the database, download (curl) the new image file from external server and save it at the local server.

  7. 1 additional delayed insert to the log table

When I run this script, i got 502 Proxy error (Reason: Error reading from remote server) after approx. 10k imported records and the page stops loading. However, the import continues (new records are still being added into tables). This stops after 20k-26k (tried several times) records.

Two questions: What's that Proxy error and why am I getting one? How to optimize my script even more? Is there some other, usual way to import huge amount of data into the MySQL?

Thanks a lot.

petiar
  • 1,047
  • 2
  • 14
  • 31
  • possible duplicate of [Import tool Excel CSV](http://stackoverflow.com/questions/2262343/import-tool-excel-csv) – Gordon Jan 11 '12 at 14:10
  • 1
    For just the SQL stuff, I had to do something similar 2 days ago - over 600,000 records from a CSV file. What I did was write a script to convert the CSV to INSERT statements (in your case, you could use conditional inserts either using a WHERE clause or EXISTS). I ended up with a 47m SQL text file that I then used with `mysql -uuser -ppass dbname – Matt H Jan 11 '12 at 14:17

2 Answers2

0

I think that you have max_request_body_size limit on your squid proxy(or whatever). You should contact proxy administrator to allow bigger POST request body, or divide to smaller parts of CSVs and repeat step for each one.

Or maybe you can check this post : https://serverfault.com/questions/185894/proxy-error-502-reason-error-reading-from-remote-server-with-apache-2-2-3-de

Community
  • 1
  • 1
rkosegi
  • 14,165
  • 5
  • 50
  • 83
0

This sounds like a proxy server in between you and the php code (very possibly the reverse proxy on the server if that is your setup) is giving up on waiting for a response from your script.

While you can probably adjust those timeouts, personally I would try decoupling very long actions like this from the http lifecycle. Not a php developer so don't know what the common pho approaches are.

Frederick Cheung
  • 83,189
  • 8
  • 152
  • 174