0

The project I'm working on currently has a categories table, as well as a business_category table (an association between a business and its categories). I want to run a query that will update the values in business_category with the new associations, and keep the ones that exist currently.

I know I can do a DELETE statement to first clear the associations, then do an insert. My question is, is there a better way to do it? Is this performance-savvy? I imagine this query might be called somewhat often, and it seems a bit extreme to delete and re-insert every time it's ran, when really all I want to do is insert if the record doesn't exist already.

For table structure, business_category is just two columns: business_id, and category_id.

Anybody got any ideas? Should I just go ahead and do the delete? Or is there a better way?

Thanks in advance.

James Spence
  • 2,020
  • 1
  • 16
  • 23
  • When you say you want to "run a query that will update the values in the business_category with the new associations, and keep the ones that exist currently", will there be a case when an existing association is removed? – cwurtz Mar 10 '15 at 20:27
  • @CJWurtz I hadn't thought of that, but yes, I suppose this is essentially a insert and replace. Huh. So delete then insert is probably the best option, based on that? Right? – James Spence Mar 10 '15 at 20:36
  • See this question: http://stackoverflow.com/questions/548541/insert-ignore-vs-insert-on-duplicate-key-update – Peter Bowers Mar 10 '15 at 20:53

1 Answers1

2

There are two options I see, both of which should be more performant that always deleting everything and then inserting the updated data.

1) First select all category_id's for the business_id being updated.

2) From the list determine which category_id's need to be removed, and only delete those.

3) From the list determine which category_id's need to be added, and only add those.

4) Anything that is left is the same, so it doesn't need to be touched.

Or you can:

1) Run an INSERT query with a "ON DUPLICATE KEY UPDATE category_id=category_id" (Here's some docs about it)

2) Run a delete query where any rows for that business_id, and where the category_id's are NOT in the list of updated category_id's. This will remove any existing ones that are not in the new updated list. ("DELETE .. WHERE category_id NOT IN ($list_of_categories)")

In the end you basically want to reduce how much writing you have to do because each time you write the index on the table will need to be updated. Doing a large amount of writes will be slower than doing a read and only writing what you have to.

Hope that helps

cwurtz
  • 3,177
  • 1
  • 15
  • 15