0

I have a CSV file of products which need to be added or updated if exists and leave the old ones even if it has been removed from the list.

I have a loop like this:

while (($data = fgetcsv($handle, 1000, ";")) !== FALSE)

Which works just fine, from it I assign few variables like this:

$price = htmlspecialchars(mysql_real_escape_string($data[2]), ENT_QUOTES, 'utf-8' );

After I have set 4 variables I need, I query MySQL like this:

mysql_query("SELECT * FROM products WHERE productid = '$prodid' AND supplier = 'SUPPLIER1'")

If it results 1 or more we just update it

            mysql_query("UPDATE tuotteet SET 
                        supplier = 'SUPPLIER1',
                        product = '$product',
                        prodid = '$prodid',
                        ean = '$ean',
                        price = '$price' WHERE prodid= '$prodid' AND supplier = 'SUPPLIER1'") or die(mysql_error());

If product not found from database, we make another query INSERT. Problem is that this is a very slow way to do this, it takes many many minutes to go through about 10000 productlines.

Anything to do with this?

Sam Dufel
  • 17,560
  • 3
  • 48
  • 51
  • 1
    You could try pre-caching data you know, then do the comparision using PHP from the pre-cached data, and update if needed using a bulk query. – Diamondo25 Mar 20 '14 at 14:56
  • Why not use `INSERT.... ON DUPLICATE KEY....` rather than SELECT followed by INSERT or UPDATE? – Mark Baker Mar 20 '14 at 14:56

3 Answers3

1

Before starting the process I would query the database for all the product_id's in the database, then I would store them in a dictionary.
Then, start processing the csv file. For every record, just ask the dictionary if it has a key with the current product_id.

In doing so, you avoid having to go to the database for every record you have in the csv.

Esteban 8A
  • 33
  • 4
  • I like this approach, combined with Diamondo25's recommendation for a bulk query. See accepted answer here for an example of how to a bulk INSERT: http://stackoverflow.com/questions/12960569/mysql-bulk-insert-via-php – Dan Bechard Mar 20 '14 at 21:05
  • This works unless the size of the CSV file exceeds your PHP max memory limit. – Bill Karwin Mar 20 '14 at 22:14
  • You're right, and if thats the case you can always change it on php.ini. – Esteban 8A Mar 21 '14 at 12:26
0

I had a similar problem. The best solution is to build a single-query. Here is the code in Python:

# data to updload in a dictionary {id, new_value}
dict = {2:1001,4:251}

ids = [] 
sql = "UPDATE tabla1 SET value = CASE id "    

for key in dict:
    sql = sql + 'WHEN '+str(key)+' THEN '+str(dict[key])+ '\n\t'
    ids.append(key)
ids_string = ', '.join(map(str, ids))
sql = sql + "END WHERE id IN ("+ids_string+")"
print sql
  • I would use the phrase "one solution" as opposed to "the best solution". – Dan Bechard Mar 20 '14 at 15:11
  • so tell me, what do you think is the best solution? or a better one? – user3441996 Mar 20 '14 at 17:14
  • "Best solution" is not a term anyone should use when answering such an open-ended question, because there is no clear-cut best. I think Esteban's ID cache is better than a dynamic SQL approach. I also think your single-query solution will fail miserably for anything more than a hundred or so records due to bandwidth and timeout constraints. – Dan Bechard Mar 20 '14 at 20:54
  • Your code example is also quite flawed. It doesn't compile in Python nor does it produce sane SQL. In particular, the line `sql = sql + "WHEN %d THEN %d ", key, key` makes no sense. If you could fix your example so that it compiles and post some sample output that would be helpful. – Dan Bechard Mar 20 '14 at 21:00
  • Sorry that line was commented, just comment it again and everything will work. (it's edited) I have tested it with hundreds of records and works fine. When we want to update thousands of rows(which is quite improbable), we can divide it in some 'single queries' – user3441996 Mar 20 '14 at 22:11
  • Depending on the size of the dict, this could generate a huge SQL query with a length that exceeds `max_allowed_packet`. – Bill Karwin Mar 20 '14 at 22:15
0

It would be a lot faster, more efficient, and require less code to use LOAD DATA:

LOAD DATA LOCAL INFILE 'myinput.csv'
 REPLACE INTO TABLE tuotteet

You just have to make sure you have a PRIMARY KEY or UNIQUE KEY defined over columns (prodid, supplier). The LOAD DATA will use that to tell if the row already exists and needs to be updated, or if the row doesn't exist and needs to be inserted as a new row.

Then you don't need to use fgetcsv(), you don't need to do any escaping, you don't need to SELECT or UPDATE or INSERT, and it should run about 10x faster.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828