I have a df named "review" like this :
usuarios | fecha | calificacion | comentario | branch_id |
---|---|---|---|---|
Cinthya De Sousa | 2022-09-20 | 1 estrella | Empleados que te atienden con poca naturalidad. | 53 |
Luana Antonella Gomez Lucero | 2022-08-23 | 1 estrella | Desastrosa atencion por parte de una mujer rub. | 53 |
I need to export df content to a table in my sql server database so I do this_:
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=xxx.xx.x.x;"
"DATABASE=xxxx;"
"UID=xxxx;"
"PWD=xxxx")
engine = create_engine('mssql+pymssql://xxxx:xxxxx@xxx.xx.x.x/xxxxxx')
review.to_sql("Google_Reviews_sucursales", engine, if_exists='append',chunksize = 1000)
everything works fine. When I go to my database I have all the lines exported (98 in the real df). The problem is that this df will be changing everyday, and when I run the script again, the rows that already exists in my sqlserver database are exported again and duplicated... waht could I do to avoid exporting duplicate data to the table? . The df will be replaced everyday and will have only the last 10 google reviews from a commerce. If these 10 records are not in the sql table, it will be fine, but probably some reviews are duplicated so, next day, if I get the last 10 reviews and 5 already exists in the table, I will be duplicating the data. Is it possible to add rows, only if they don´t already exist in the table?