I'm relatively new to SQL and python. I'm trying to write code that will handle updates to an existing table in SQL. I've made an example below.
id Project Company Start Date Industry
1 Zebra Apple 1/2/2022 Software
2 Charlie Tesla 2/2/2022 Automotive
3 Alpha Google 3/2/2022 Software
4 Omega Facebk 4/2/2022 Social Media
5 Beta Twitter 1/2/2022 Social Media
I'm currently reading a named range from an Excel workbook that will contain the same five columns as my existing mySQL table. I'd like to insert the new data into existing table, called porjects - however, lets say someone updates just the start date for a specific company/project, I dont want a row created.
My thinking was to iterate through my new dataframe rows and, for each row, search for any columns in my existing table that match both the project name and company name (just in case someone reuses a project name without knowing) and drop those rows from the existing table. Once that's done, use pandas.to_sql()
to append the data.
I'm not sure if this is the most efficient way to go about this.
I had seen a previous solution to a similar problem suggest something like this. Am I on the right track or am I better trying something like the below?
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE t1.id NOT IN (SELECT id
FROM TABLE_2)