-2

Given the two tables below, each ball can have one or more colors. What would be the most efficient way to update the ball_color_mapping table when changes are made in a UI?

For example, ball_id = 1 had color_ids of 1,2,3 and after an update should only have color_id = 2.

The possible options I see:

  1. Delete all rows matching the ball_id then insert the new values.
  2. Fetch the existing color_ids, compare new vs old values and delete/insert.

Option 1 seems like the simplest but would also result in deleting and inserting if the data didn't change. Option 2 would add more operations and complexity since the data needs to be fetched then deleted based on the criteria and new values would be inserted (3 operations + logic vs 2 operations)

For others that encountered this, which option would be more efficient and are there better methods?

ball table:

ball_id name brand_id
1 tennis ball 1
2 pool ball 2
3 ping pong ball 1

ball_color_mapping table:

ball_id color_id
1 1
1 2
1 3
2 1
3 2
3 3
MCSharp
  • 1,068
  • 1
  • 14
  • 37
  • How many rows will be in each table? What are the `SELECTs` you will need to run? – Rick James Feb 06 '22 at 02:36
  • The `ball_color_mapping table` will have as many rows as there are colors associated with the `ball_id`. The `SELECT` will return the `ball_table` + all color_ids or names associated with the ids in either a string or JSON obj that can be parsed. – MCSharp Feb 22 '22 at 17:00

2 Answers2

1

I'd do it this way:

DELETE FROM ball_color_mapping WHERE ball_id = ? AND color_id NOT IN (?, ?, ?);

INSERT INTO ball_color_mapping VALUES (?, ?), ... 
ON DUPLICATE KEY UPDATE color_id=VALUES(color_id);

That handles cases where a color is removed from the set of colors for a ball, then it inserts where necessary, and does a no-op UPDATE if the color is already in the database.

This assumes the pair of columns in ball_color_mapping is its primary key, to trigger the ON DUPLICATE clause.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • The `VALUES` function in `ON DUPLICATE KEY UPDATE color_id=VALUES(color_id)` is deprecated and should be updated to the newer structure: https://stackoverflow.com/questions/63609570/mysql-values-function-is-deprecated – MCSharp Feb 04 '22 at 15:57
-1

Somewhere you have a mapping between color_id and color? I would consider getting rid of that. Unless there is some unspoken reason for that 'normalization', I suggest that it is "over-normalization".

Since the delete and insert tasks are simple and quick, and since you will have the appropriate indexes (you will, won't you?), then do whatever is simplest. That is probably doing these two steps always:

  1. DELETE all the colors for the ball being edited
  2. INSERT the new set of colors.

What will you ultimately do with the colors? If you will simply regurgitate them upon request, then they could be stored in a simple VARCHAR(191) column in the Balls table. If you need to "find all 'blue' balls", the it would be better to have one row per color. (And, again, I suggest simply 'blue', not a color_id.)

If you are not talking about hundreds of colors on a ball or thousands of balls, then consider the FIND_IN_SET('blue', "red,white,black,blue") type test. (It will require checking all rows, but for a small dataset, that is tolerable.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I appreciate your suggestion. I think there is value in mapping the values for several reasons. 1) The output of all colors can be changed based on the query. What if you need this structure for the client `[{id: 1, name: 'red'},{id: 2, color: 'blue'}]`? 2)Preventing data duplication. Why would you store many similar strings if you don't need to? 3)Handling duplicates without additional logic. You'd need to split the string first, then check if a value exists, then modify it in your app. 4)Easy querying for statistics. It would be much simpler and faster to search for a specific color id. – MCSharp Feb 22 '22 at 17:50
  • Do you ever change all occurrences of "red" to "purple"? As for data duplication, the 4-byte `INT` id is being duplicated instead of the 5-byte "blue" string (4 chars+ length); what is the goal of deduplication? – Rick James Feb 22 '22 at 18:05
  • `SELECT color, COUNT(*) FROM t GROUP BY color` would say how many blue balls there are (etc). Yes, this is nearly impossible if they are stored as `'"red,blue"`. So avoid this approach. – Rick James Feb 22 '22 at 18:07
  • The balls and mapping in my question was just an example. In an app with many users that can have many products, and each user can several colors per product, tying everything to a color makes sense. For example, toy collectors can have cars, trucks, motorcycles. Each collector can have more than one vehicle but in different colors. What if you want to name the colors like "blue steel" and then change the name? When looking at a large scale, it makes sense to use a mapping table for simplicity and not repeat lengthy strings. – MCSharp Feb 22 '22 at 18:48
  • @MCSharp - I apologize. I can only answer the question as it is posed. I tried to go beyond the base question, but I failed to go in that direction. In any case, I hope you have some more insight from Bill and me. (Ditto for the 29+ others that have, or will, read this thread.) – Rick James Feb 22 '22 at 19:54