0

I have a python code to insert dataframe data created from csv file to Postgres. My data are not edited, so I'm checking for errors when I'm inserting one-by-one line to the database.

Often I get error that my value is too long for type VARCHAR(15) etc.. It's okay, but when I get the error my data are not inserted at all. I would like to make the code continue inserting data when error occurs and not stop totally...


def df2db(conn: psycopg2.extensions.connection, df: pd.DataFrame, table: str):
    columns = [col for col in df.columns]
    buf = StringIO()
    df.to_csv(buf, sep='\t', na_rep='\\N', index=False, header=False)
    buf.seek(0)
    cursor = conn.cursor()
    for row in buf:
        row = row.replace("\n", "").split("\t")
        row = ["'" + val + "'" for val in row]
        try:
            cursor.execute(f"INSERT INTO {table} ({','.join(columns)}) VALUES({','.join(row)}) ON CONFLICT DO NOTHING")
        except psycopg2.Error:
            conn.rollback()
            continue # here it continues, but my data are not inserted
    cursor.close()

BTW. I have about 20mil records, I can't do expensive processing

Marcel Kopera
  • 304
  • 2
  • 9

1 Answers1

0

No “expensive” processing needed. Just cut the strings to their field lengths according to the database schema - either on Python side, or using functions in the query.

But I’d do it differently: use pg tool or pgAdmin or the COPY sql statement to load the CSV as-is into a temporary table, since that will be very fast, and then do a query that copies the data over, cutting strings to maximum length.

See this q&a for more details.

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • 1
    thanks for the answer, I'm inserting 4 csv files with 30 columns each, I think it would be very verbose to write max length for 120 columns in python. And I'm not inserting from localhost, so the second approach would need `paramiko`? or so? – Marcel Kopera Dec 25 '20 at 00:34