0

I have a df named "review" like this :

usuarios fecha calificacion comentario branch_id
Cinthya De Sousa 2022-09-20 1 estrella Empleados que te atienden con poca naturalidad. 53
Luana Antonella Gomez Lucero 2022-08-23 1 estrella Desastrosa atencion por parte de una mujer rub. 53

I need to export df content to a table in my sql server database so I do this_:

params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
                                     "SERVER=xxx.xx.x.x;"
                                     "DATABASE=xxxx;"
                                     "UID=xxxx;"
                                     "PWD=xxxx")
engine = create_engine('mssql+pymssql://xxxx:xxxxx@xxx.xx.x.x/xxxxxx')

review.to_sql("Google_Reviews_sucursales", engine, if_exists='append',chunksize = 1000)

everything works fine. When I go to my database I have all the lines exported (98 in the real df). The problem is that this df will be changing everyday, and when I run the script again, the rows that already exists in my sqlserver database are exported again and duplicated... waht could I do to avoid exporting duplicate data to the table? . The df will be replaced everyday and will have only the last 10 google reviews from a commerce. If these 10 records are not in the sql table, it will be fine, but probably some reviews are duplicated so, next day, if I get the last 10 reviews and 5 already exists in the table, I will be duplicating the data. Is it possible to add rows, only if they don´t already exist in the table?

Maximiliano Vazquez
  • 196
  • 1
  • 2
  • 12

0 Answers0