I am building a database from scraped data. I want to only insert records which are not yet in my database and disregard all the duplicate records.
Following is my code
conn = sqlite3.connect('Database.db')
df.to_sql("My_cool_data_Temp", conn, if_exists='replace', index=False)
query = "SELECT * FROM My_cool_data_Temp EXCEPT SELECT * FROM My_cool_data;"
new_entries = pd.read_sql(query, conn)
new_entries.to_sql("My_cool_data", conn, if_exists='append', index=False)
conn.execute("DROP TABLE My_cool_data_Temp;")
This code gives me the following error:
sqlite3.IntegrityError: UNIQUE constraint failed:
Printing out new_entries
I saw that for some reason the query doesn't seem to work as I get duplicates.
Any ideas what I might miss? I would be also glad for different approaches to my problem.
Thank you very much!