I'm saving the results from my Tensorflow image classifier in a SQL database. I have 3 tables. Images, Categories and a table joining the two with weight variables. Some images have no relationships and some have lots.
The problem is that I have duplicate rows in the image table that need to be removed. But I need to preserve the many to many relationships, if the duplicated image has one or more.
Here's an example:
Table name: my_images
+----+------------+-----------------+
| ID | image_path | image_filename |
+----+------------+-----------------+
| 1 | Film 1 | Film 1 001.jpg |
| 2 | Film 1 | Film 1 001.jpg |
| 3 | Film 1 | Film 1 002.jpg |
| 4 | Film 1 | Film 1 002.jpg |
| 5 | Film 1 | Film 1 003.jpg |
| 6 | Film 1 | Film 1 003.jpg |
+----+------------+-----------------+
Table name: my_terms
+---------+------------+
| term_id | term_name |
+---------+------------+
| 1 | cat |
| 2 | dog |
| 3 | automobile |
+---------+------------+
Table name: my_term_relationships
+----------+---------+---------+
| image_id | term_id | weight |
+----------+---------+---------+
| 2 | 1 | 0.58516 |
| 2 | 3 | 0.16721 |
| 3 | 2 | 0.21475 |
+----------+---------+---------+
So in this example, the ideal outcome would be to delete rows 1, 4 and either row 5 or 6 from my_images.