I'm trying to make a specific insert statement that has an ON CONFLICT
argument (I'm uploading to a Postgres database); will the df.to_sql(method='callable') allow that? Or is it intended for another purpose? I've read through the documentation, but I wasn't able to grasp the concept. I looked around on this website and others for similar questions, but I haven't found one yet. If possible I would love to see an example of how to use the 'callable' method in practice. Any other ideas on how to effectively load large numbers of rows from pandas using ON CONFLICT
logic would be much appreciated as well. Thanks in advance for the help!
Asked
Active
Viewed 1,873 times
9

nucsit026
- 652
- 7
- 16

Sean Morey
- 91
- 3
-
Good question, not sure if you already found [this](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#insertion-method), but this might help. – Erfan Jan 21 '20 at 16:22
-
2Hey @Efran! I had seen that, but I was confused by it. I see that they've created a function that uses a cursor, but I didn't see how they actually use that function inside of pandas? I'm sure the answer I need is in that example, but it hasn't clicked for me yet. :( – Sean Morey Jan 21 '20 at 16:29
2 Answers
5
Here's an example on how to use postgres's ON CONFLICT DO NOTHING
with to_sql
# import postgres specific insert
from sqlalchemy.dialects.postgresql import insert
def to_sql_on_conflict_do_nothing(pd_table, conn, keys, data_iter):
# This is very similar to the default to_sql function in pandas
# Only the conn.execute line is changed
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(insert(pd_table.table).on_conflict_do_nothing(), data)
conn = engine.connect()
df.to_sql("some_table", conn, if_exists="append", index=False, method=to_sql_on_conflict_do_nothing)

Decko
- 18,553
- 2
- 29
- 39
-
1For anyone else interested in referencing the default to_sql function in pandas, here's the [source](https://github.com/pandas-dev/pandas/blob/8cb7cfe326219b83ccc8a0d0b211fab40f4575c1/pandas/io/sql.py#L853) – John Sep 06 '22 at 20:33
0
I have just had similar problem, and followed by to this answer I came up with solution on how to send df
to potgresSQL
ON CONFLICT
:
from sqlalchemy import create_engine
engine = create_engine(connection_string)
df.to_sql(table_name,engine)
2. add primary key
ALTER TABLE table_name ADD COLUMN id SERIAL PRIMARY KEY;
3. prepare index on the column (or columns) you want to check the uniqueness
CREATE UNIQUE INDEX review_id ON test(review_id);
4. map the sql table with sqlalchemy
from sqlalchemy.ext.automap import automap_base
ABase = automap_base()
Table = ABase.classes.table_name
Table.__tablename__ = 'table_name'
6. do your insert on conflict
with:
from sqlalchemy.dialects.postgresql import insert
insrt_vals = df.to_dict(orient='records')
insrt_stmnt = insert(Table).values(insrt_vals)
do_nothing_stmt = insrt_stmnt.on_conflict_do_nothing(index_elements=['review_id'])
results = engine.execute(do_nothing_stmt)

Dmitriy Grankin
- 568
- 9
- 21