Most effecient way to insert data into an existing mySQL table from a python dataframe but if column2 and column 3 contain dupes, update existing

Question

I'm relatively new to SQL and python. I'm trying to write code that will handle updates to an existing table in SQL. I've made an example below.

id  Project Company Start Date  Industry
1   Zebra   Apple   1/2/2022    Software
2   Charlie Tesla   2/2/2022    Automotive
3   Alpha   Google  3/2/2022    Software
4   Omega   Facebk  4/2/2022    Social Media
5   Beta    Twitter 1/2/2022    Social Media

I'm currently reading a named range from an Excel workbook that will contain the same five columns as my existing mySQL table. I'd like to insert the new data into existing table, called porjects - however, lets say someone updates just the start date for a specific company/project, I dont want a row created.

My thinking was to iterate through my new dataframe rows and, for each row, search for any columns in my existing table that match both the project name and company name (just in case someone reuses a project name without knowing) and drop those rows from the existing table. Once that's done, use pandas.to_sql() to append the data.

I'm not sure if this is the most efficient way to go about this.

I had seen a previous solution to a similar problem suggest something like this. Am I on the right track or am I better trying something like the below?

INSERT INTO TABLE_2
  (id, name)
SELECT t1.id,
       t1.name
  FROM TABLE_1 t1
 WHERE t1.id NOT IN (SELECT id
                       FROM TABLE_2)

See also https://stackoverflow.com/questions/15383852/sql-if-exists-update-else-insert-into and https://stackoverflow.com/questions/40223927/insert-or-update-if-exists-in-mysql-using-pandas — Nick ODell, Feb 02 '23 at 17:07
So i've seen this block before INSERT INTO table (id, name, age) VALUES(1, "A", 19) ON DUPLICATE KEY UPDATE name="A", age=19. What I'm not sure about is lets say I do that for all of my columns, , if there is a duplicate in the start date, but not in the project name or company name, wouldn't this update the start date for a different project? — novawaly, Feb 02 '23 at 17:17
and similarly, if i only say ON DUPLICATE KEY for only the first two columns, does that mean it will update just the project name and company and nothing else? I guess I'm not clear on how the on duplicate works. — novawaly, Feb 02 '23 at 17:21
does on duplicate key mean it's only looking for a duplicate in the key column (which would be my id columns)? — novawaly, Feb 02 '23 at 18:18
That, or any other column you designate as unique: https://www.javatpoint.com/mysql-unique-key — Nick ODell, Feb 02 '23 at 18:21

Most effecient way to insert data into an existing mySQL table from a python dataframe but if column2 and column 3 contain dupes, update existing

0 Answers0