3

This is my db structure:

ID    NAME    SOMEVAL    API_ID
1     TEST    123456     A123
2     TEST2   223232     A123
3     TEST3   918922     A999
4     TEST4   118922     A999

I'm filling it using a function that calls an API and gets some data from an external service.

The first run, I want to insert all the data I get back from the API. After that, each time I run the function, I just want to update the current rows and add rows in case I got them from the API call and are not in the db.

So my initial thought regarding the update process is to go through each row I get from the API and SELECT to see if it already exists.

I'm just wondering if this is the most efficient way to do it, or maybe it's better to DELETE the relevant rows from the db and just re-inserting them all.

NOTE: each batch of rows I get from the API has an API_ID, so when I say delete the rows i mean something like DELETE FROM table WHERE API_ID = 'A999' for example.

Or Weinberger
  • 7,332
  • 23
  • 71
  • 116
  • I don't see any problems in just doing a SELECT first, it would accomplish your goals, that way you can do data verification on the data before you modify it. I wouldn't outright delete any row without first doing some verification. You could do a blind update and detect if it succeeded if you want to save queries, but that might have its own cost. – Resorath Feb 27 '12 at 21:59
  • possible duplicate of: http://stackoverflow.com/questions/4854863/mysql-update-vs-insert-and-delete – Nir Alfasi Feb 27 '12 at 22:01

5 Answers5

0

If you retrieving all the rows from the service i recommend you the drop all indexes, truncate the table, then insert all the data and recreate indexes.

If you retrieving some data from the service i would drop all indexes, remove all relevant rows, insert all rows then recreate all indexes.

Peter Kiss
  • 9,309
  • 2
  • 23
  • 38
0

If you have all of the data returned from the API that you need to completely reconstruct the rows after you delete them, then go ahead and delete them, and insert afterwards.

Be sure, though, that you do this in a transaction, and that you are using an engine that supports transactions properly, such as InnoDB, so that other clients of the database don't see rows missing from the table just because they are going to be updated.

For efficiency, you should insert as many rows as you can in a single query. Much faster that way.

BEGIN;
DELETE FROM table WHERE API_ID = 'A987';
INSERT INTO table (NAME, SOMEVAL, API_ID) VALUES
    ('TEST5', 12345, 'A987'),
    ('TEST6', 23456, 'A987'),
    ('TEST7', 34567, 'A987'),
    ...
    ('TEST123', 123321, 'A987');
COMMIT;
Ian Clelland
  • 43,011
  • 8
  • 86
  • 87
  • Not sure I made it clear.. The API call get's an API_ID and return only rows belonging to that API_ID. so when I update, I don't need to truncate the table I just need to delete rows that have a specific API_ID – Or Weinberger Feb 27 '12 at 22:03
  • No, I missed the bit where an API_ID is associated with a group of rows, not just a single row. I thought that the API was returning several rows, all with different API_IDs, and you were planning to delete them one-at-a-time. – Ian Clelland Feb 27 '12 at 22:13
0

In such scenarios I'm usually going with:

  • start transaction
  • get row from external source
  • select local store to check if it's there
    • if it's there: update its values, remember local row id in list
    • if it's not there: insert it, remember local row id in list
  • at the end delete all rows that are not in remembered list of local row ids (NOT IN clause if the count of ids allows for this, or other ways if it's possible that there will be many deleted rows)
  • commit transaction

Why? Because usually I have local rows referenced by other tables, and deleting them all would break the references (not to mention deletete cascade).

Furgas
  • 2,729
  • 1
  • 18
  • 27
0

I don't see any problem in performing SELECT, then deciding between an INSERT or UPDATE. However, MySQL has the ability to perform so-called "upserts", where it will insert a row if it does not exist, or update an existing row otherwise.

This SO answer shows how to do that.

Community
  • 1
  • 1
bfavaretto
  • 71,580
  • 16
  • 111
  • 150
0

I would recommend using INSERT...ON DUPLICATE KEY UPDATE.

If you use INSERT IGNORE, then the row won't actually be inserted if it results in a duplicate key on API_ID.

Add unique key index on API_ID column.