0

I wanted to ask how it would be best to proceed in the case that I will explain below:

I enter more than 200,000 rows in a mysql database via php, I take the lines from a csv file, my question is as follows:

I should parse the csv file and as I read I insert the rows, or should I parse the file, forfeit all in an array and then using a foreach insert rows in the database.

I ask this because the inclusion of the data in the database is to be really slow. The table is an InnoDB.

I hope I was clear.

N.B. I can not make a "LOAD DATA LOCAL INFILE" because fields in the .csv file have been mapped previously.

Thank you

enzo
  • 419
  • 3
  • 7
  • 16
  • Maybe you could use PHP to remap the fields into something that will work with LOAD DATA LOCAL INFILE. – dmgig Mar 14 '16 at 20:05
  • "inclusion of the data in the database is to be really slow": suppose you mean it is slower than you want, or do you really want slow inserts. – Gilbert Mar 14 '16 at 20:05
  • What do you mean by "mapped previously"? If it's just that the data already exists, you can use LOAD DATA with the column headers skipped. If the columns need processing/conversion, you can read into temp variables. – bitfiddler Mar 14 '16 at 20:06
  • What do you mean by "fields in the .csv file have been mapped previously"? `LOAD DATA INFILE` will insert new rows into your table, not remove them. – Terry Mar 14 '16 at 20:08
  • @bitfiddler Asking the same question but you beat me! – Terry Mar 14 '16 at 20:09
  • Thanks for your interest, your comments I understand that "LOAD DATA INFILE" is the fastest way, then I would agree to create a file with just the information I need in the process of reading and finally import the new file using the "LOAD function DATA INFILE "in my db. Right? Sorry for my English... – enzo Mar 14 '16 at 20:25
  • Or you could just use phpmyadmin and import the csv? – WheatBeak Mar 14 '16 at 20:26
  • @WheatBeak phpMyAdmin will almost definitely timeout unless steps are taken to allow it extra execution time as it is limited by the php parameter `max_execution_time` – RiggsFolly Mar 14 '16 at 22:07
  • I guess it depends on how much data is in those 200,000 rows, I'm certain I've done more than that. Either way if there is too much data any PHP script is going to timeout. – WheatBeak Mar 15 '16 at 13:07

1 Answers1

1

The two options you suggest will be nearly as fast, the one with the array maybe slightly slower because of the additional memory access.

But I suppose the bottleneck when executing a lot of insert statements is the database communication. So a third option that might be faster is to reduce the number of insert statements concatenating all insert data into one statement, e.g:

INSERT INTO mytable VALUES(1,2,3),(4,5,6),(7,8,9), ... (10000, 10001, 10002);

As far as I know a mysql statement has a maximum length, so maybe it is necessary to split the statement into multiple chunks.

This answer makes the performance benefit when consolidating insert statements quite clear: https://stackoverflow.com/a/1793209/1696795

Community
  • 1
  • 1
Mario A
  • 3,286
  • 1
  • 17
  • 22