Update Redshift using pandas dataframe is failing with string index out of range

Question

I am trying to update a Redshift table using psycopg2 and psycopg2.extras but it is failing with the below error. Can someone help with this error?

{
  "errorMessage": "string index out of range",
  "errorType": "IndexError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 110, in lambda_handler\n    psycopg2.extras.execute_values (cursor, update_query, row, template=None, page_size=2000)\n",
    "  File \"/opt/python/lib/python3.8/site-packages/psycopg2/extras.py\", line 1289, in execute_values\n    parts.append(cur.mogrify(template, args))\n"
  ]
}

I have a dataframe with 23 columns which I am trying to updates as below in AWS Lambda. The connection to the database is successful but update is failing:

import psycopg2
import psycopg2.extras

df_pandas ## dataframe with 23 columns and 28 rows
connection = psycopg2.connect(host='casuc', dbname='skhcbiw',
                                             user='cksbci', password='****', port=0000)
                cursor = connection.cursor()
                #UPDATE: INSTEAD OF ITERTUPLES, I HAVE REPLACED IT with ITERROWS WHICH HAS GIVEN A DIFFERENT ERROR FROM THE ONE ABOVE. ERROR specified below the CODE
                for _, row in df_pandas.iterrows():
                    row = str(tuple(row)) #create a tuple that is a string
                    row = row[1:len(row)-1] #remove the beginning & ending ()
                    print(row)
                
                    update_query = """UPDATE table AS t 
                                      SET column1 = e.column1, column2 = e.column2, column3 = e.column3, 
                                          ......................................................
                                          ......................................................
                                          column22 = e.column22, column23 = e.column23
                                      FROM (VALUES %s) AS e('column1', 'column2', 'column3',
                                                             .......................................
                                                             .......................................
                                                             .......................................
                                                             .......................................
                                                             'column21', 'column22', 'column23') 
                                      WHERE e.column23 = t.column23;"""
                    psycopg2.extras.execute_values (cursor, update_query, row, template=None, page_size=2000)

NEW ERROR


{
  "errorMessage": "syntax error at or near \")\"\nLINE 9: ...','1','7','6','5','1','''',',',' ','0','.','0',',',' ','0'))\n                                                                      ^\n",
  "errorType": "SyntaxError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 118, in lambda_handler\n    psycopg2.extras.execute_values(cursor, update_query, (row, ), template=None, page_size=2000)\n",
    "  File \"/opt/python/lib/python3.8/site-packages/psycopg2/extras.py\", line 1292, in execute_values\n    cur.execute(b''.join(parts))\n"
  ]
}

My INPUT ROW IS AS BELOW

'Zone 99', 'J005', 'Accepted', 'BIWUDBI', 'MNO101', '90.00H50 IUHIUH   YY 55RR', '878767', 0, 'Knoidci', 'A99', 0.0, 0, '2192238', '2020-12-31', 0.0, 0.0, 0.0, 0, 0, 0, '50017651', 0.0, 0

I see that each of the values of the data '50017651' is getting passed as '5', '0', '0',.... . I am not sure what is the reason?

I have referred these 2 URLs from StackOverflow to solve my problem but no luck.

Thanks Ganesh

Can you run `select * from stl_load_errors` on your cluster? — Robert Kossendey, May 20 '21 at 13:51
Yes I did and got the below error but for 29th April and not today. ``` err_code err_reason 1207 Overflow, 9.27549161e-39 (Float valid range 1.175494e-38 to 3.402823e+38)``` — Ganesh Bhat, May 20 '21 at 14:28

score 0 · Accepted Answer · answered Jun 08 '21 at 07:31

The above code did not work and hence I decided to use the approach mentioned in the link: How to update a Postgres table column using a pandas data frame?

The approach is to create a TEMP TABLE and dump/insert the data into this table and then update the table I want to using this TEMP TABLE. Once the connection is committed, the TEMP TABLE is automatically deleted in AWS Redshift. We don't need to specify 'ON COMMIT DROP' in the TEMP TABLE code.

import psycopg2

conn = psycopg2.connect("dbname='db' user='user' host='localhost' password='test'")
cur = conn.cursor()

rows = zip(df.id, df.z)
cur.execute("""CREATE TEMP TABLE codelist(id INTEGER, z INTEGER) ON COMMIT DROP""")
cur.executemany("""INSERT INTO codelist (id, z) VALUES(%s, %s)""", rows)

cur.execute("""
    UPDATE table_name
    SET z = codelist.z
    FROM codelist
    WHERE codelist.id = vehicle.id;
    """)

cur.rowcount
conn.commit()
cur.close()
conn.close()

Regards Ganesh Bhat

Update Redshift using pandas dataframe is failing with string index out of range

1 Answers1