3

I have a script to insert a DataFrame in a table using executemany().

The problem is that this table has an ID as Primary Key, and sometimes it can occur to insert a row with same ID.

I would like to know if there is an easy way to handle this kinda of exception and continue the executemany() execution.

The alternative i was thinking is to check all IDs of the DataFrame that is in the table, and remove them before inserting in the database... but i don't know if this would be performatic...

My code:

params = (tuple(row) for _, row in df.iterrows())
sql = '''INSERT INTO stilingue.stalker_comments values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)'''
start = time.time()
try:
    self.cursor.executemany(sql, params)
    self.conn.commit()
except Exception as e:
    print(e)
    self.conn.rollback()
    print('Something went wrong...')
end = time.time()
print('Execution time: {0:.2f} seconds.'.format(end-start))

DataFrame:

    channel followers   gender  hashtags    interactions    likes   location    mentions    name    page_comment    ... text    themes  uid user_image_url  user_url    username    verified    videoplays  business    rt_count
0   Inbox do Facebook   0   Não Definido        0   0           Midiam Mendes   False   ... Sacanagem isso né?? Poorq vocês dizeram que o ...       1995608377159933    https://storage.googleapis.com/usersstilingue/...           False   0   Itaú    0
1   Inbox do Facebook   0   Não Definido        0   0           Midiam Mendes   False   ... Eu tenho provas , e posso processar vocês!!     1995608377159933    https://storage.googleapis.com/usersstilingue/...           False   0   Itaú    0
2   Inbox do Facebook   0   Não Definido        0   0           Midiam Mendes   False   ... Isso é um absurdo       1995608377159933    https://storage.googleapis.com/usersstilingue/...           False   0   Itaú    0

Traceback:

('23000', "[23000] [Microsoft][ODBC SQL Server Driver][SQL Server]Violation of PRIMARY KEY constraint 'PK__stalker___DD37D91A4691B0F7'. Cannot insert duplicate key in object 'stilingue.stalker_comments'. The duplicate key value is (m__g64-pbys7OlEvp8xmfyktlNIHrUPQPiNrcKrPVOF_Lj84OJfN4WtAJ92lj7YnzAOQ1B7EDCJf85k_UcwB0-4Q). (2627) (SQLExecDirectW); [23000] [Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated. (3621)")
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
Lucas Hort
  • 812
  • 7
  • 9

1 Answers1

2

If your data is not large, the simplest way is to create a temporary table in your database that doesnt have PK. Then insert the data into that temp, remove the duplication from your temporary (If you have SQL server db you can use following syntax to remove duplication) and insert your data in your main table.

 WITH table_1 AS 
(SELECT *,RN=ROW_NUMBER() OVER(PARTITION BY [pk_field]
 order by date) 
 FROM [temporary_table])
 DELETE FROM table_1  WHERE RN>1
user3665906
  • 185
  • 13