Looping trough dataframe and checking rows before appending to database

Question

Question How do I append my dataframe to database so that it checks if stock_ticker exists , only to append the rows where stock_ticker does not exist?

This is the process that I did

Import CSV file to pandas dataframe
Assign column names to be same as in database
Sending the dataframe to database using the code below but getting

sqlite3.IntegrityError: UNIQUE constraint failed: stocks.stock_ticker

conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()

df.to_sql(name='stocks', con=conn, if_exists='append', index=False)

conn.commit()

I looked at other Integrity Error cases but can't seem to find one that works with appending dataframes? I found and tried this but all it does is just not append anything.

try:
    conn = sqlite3.connect('stockmarket.db')
    c = conn.cursor()
    df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
    conn.commit()
except sqlite3.IntegrityError:
    print("Already in database")

I am not sure I am understanding the iterating thing correctly

How to iterate over rows in a DataFrame in Pandas

So I tried this, but it just prints out already in database for each of them. Even tough there is 4 new stock tickers.

for index, row in df.iterrows():
    try:
        conn = sqlite3.connect('stockmarket.db')
        c = conn.cursor()
        df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
        conn.commit()
    except sqlite3.IntegrityError:
        print("Already in database")

The database looks like this

any insight much appreciated :)

tania · Accepted Answer · 2020-09-14T07:00:29.400

1

It looks like this happens because Pandas doesn't allow for declaring a proper ON CONFLICT policy, in case you try to append data to a table that has the same (unique) primary key or violates some other UNIQUEness constraint. if_exists only refers to the whole table itself, not each individual row.

I think you already came up with a pretty good answer, and maybe with a small modification it would work for you:

# After connecting
for i in range(len(df)):
    try:
        df[df.index == i].to_sql(name='stocks', con=conn, if_exists='append', index=False)
        conn.commit()
    except sqlite3.IntegrityError:
        pass

Now, this might be a problem if you want to actually replace the value if a newer one appears in your Pandas data and let's say you want to replace the old one that you have in the database. In that case, you might want to use the raw SQL command as a string, and pass the Pandas values iteratively. For example:

insert_statement = """
INSERT INTO stocks (stock_id,
                    stock_ticker,
                    {other columns})
VALUES (%s, %s, {as many %s as columns})
ON CONFLICT (stock_id) DO UPDATE
    SET {Define which values you will update on conflict}"""

And then you could run

for i in range(len(df)):
    values = tuple(df.iloc[i])
    cursor.execute(insert_statement, values)

edited Sep 14 '20 at 07:00

answered Sep 13 '20 at 21:39

tania

2,104
10
18

Thank you for taking the time to help me with this :). I tried the first code, I made an edit in the original question to show how it looks and the error I am getting. Did I add it correctly? For the second part that you suggested. I set up all the column names, but what do i change the %s to? Also not sure waht i change the dots in set... too. Sorry for noobie questions >.< On day 42/100 day learning to code challenge haha – Jakub Sep 13 '20 at 22:24
@Jakub I just edited my response to fix that. It seems that if you take `df.iloc[i]`, Pandas doesn't know how to send this series to the database (hence `table stocks has no column named 0`). I changed that line in the first block. Regarding the second one, the specific syntax would really depend on the database. For example in Postgres, the syntax works like this: https://www.postgresqltutorial.com/postgresql-upsert/ Finally, the `%s` is just a notation for string formatting, see here: https://stackoverflow.com/questions/902408/how-to-use-variables-in-sql-statement-in-python – tania Sep 14 '20 at 07:04
1

Thank you so much for the help! Your edit solved my question :) , now to try and figure out from the links on how to do the 2nd part you added. I didn't even realize it until you pointed it out. That yea, I need to be able to update the rows for the stock_tickers with new info w/o adding new stock_tickers if they already exists >. – Jakub Sep 14 '20 at 19:47

Looping trough dataframe and checking rows before appending to database

1 Answers1