1

I have to read approximately 5000 rows of 6 columns max from a xls file. I'm using PHP >= 5.3. I'll be using PHPExcel for that task. I haven't try it but I think it can handle (If you have other options, they are welcome). The issue is that every time I read a row, I need to query the database to verify if that particular row exists, If it does, then overwrite it, If not, then add it. I think that's going to take a lot of time and PHP will just simply timeout ( I can't modify the timeout variable since it's a shared server). Could you give me a hand with this? Appreciate your help

user2430929
  • 113
  • 5
  • 15

2 Answers2

2

Since you're using MySQL, all you have to do is insert data and not worry about a row being there at all. Here's why and how:

  • If you query a database from PHP to verify a row exists, that's bad. Reason it's bad is because you are prone to getting false results. There's a lag between PHP and MySQL, and PHP can't be used to verify data integrity. That's the job of the database.

  • To ensure there are no duplicate rows, we use UNIQUE constraints on our columns.

  • MySQL extends SQL standard using INSERT INTO ... ON DUPLICATE KEY UPDATE syntax. That lets you just insert data, and if there's a duplicate row - you can just update it with new data.

  • Reading 5000 rows is quick. Inserting 5000 is also quick, if you wrap it in a transaction. I would suggest reading 100 rows from the excel file, starting a transaction and just insert data (using ON DUPLICATE KEY UPDATE to handle duplicates). That will let you spend 1 I/O of your hard drive to save 100 records. Doing so, you can finish this whole process in a few seconds, which lets you not to worry about performance or timeouts.

Mjh
  • 2,904
  • 1
  • 17
  • 16
  • Insert into ... on duplicate key update it's good, but at every query that's will autoincrement primary autoincrement index... 5000 riiws in PHPExcel is not quick, and it depends from hositng – Andrey Vorobyev May 29 '15 at 04:47
  • @AndreyVorobyev - what you're saying is not true at all. I've worked on more than enough projects, so I talk from experience. 5000 inserts are possible within 1 second. You just need to have the knowledge to use the software properly. 5000 numbers are auto incremented within a microsecond. – Mjh May 29 '15 at 07:33
  • @AndreyVorobyev - what's your point? I don't know what's that got to do with my answer. I answered how to quickly perform the insert and avoid duplicates, exactly what the OP asked. How the data arrives or is read is not the scope of this question, nor answer. – Mjh May 29 '15 at 09:14
  • Hey man! @Mjh I didn't know about this method. I found it very practical, thanks! I haven't benchmarked it, but it takes 45 secs to upload the file (600kb) with 1mb connection -> read the data with PHPExcel and store it on arrays -> call my model and make a for because I have an array of arrays (the latter hold the actual information to the DB) and then finally call the insert function of CodeIgniter's orm inside the loop with every array. Can I still improve this? – user2430929 Jun 02 '15 at 03:10
  • @user2430929 - have you wrapped the inserts in a transaction or not? 45 seconds for 1MB sounds way, way too long. – Mjh Jun 02 '15 at 07:28
-1

At first run this process via exec, and timeout has no matter
At second, select all rows before read excel file. Select not at one query, read 2000 rows at time for example, and collect it to array.
At third use xlsx format and chunkReader, that allows read not whole file. It's not 100% garantee, but i did the same.

Community
  • 1
  • 1
Andrey Vorobyev
  • 896
  • 1
  • 10
  • 37