I am trying to learn how to save dataframe created in pandas into postgresql db (hosted on Azure). I planned to start with simple dummy data:
data = {'a': ['x', 'y'],
'b': ['z', 'p'],
'c': [3, 5]
}
df = pd.DataFrame (data, columns = ['a','b','c'])
I found a function that pushed df data into psql table. It starts with defining connection:
def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1)
print("Connection successful")
return conn
conn = connect(param_dic)
*param_dic contains all connection details (user/pass/host/db) Once connection is established then I'm defining execute function:
def execute_many(conn, df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = [tuple(x) for x in df.to_numpy()]
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL quert to execute
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
cursor = conn.cursor()
try:
cursor.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("execute_many() done")
cursor.close()
I executed this function to a psql table that I created in the DB:
execute_many(conn,df,"raw_data.test")
The table raw_data.test consists of columns a(char[]), b(char[]), c(numeric). When I run the code I get following information in the console:
Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
^
DETAIL: Array value must start with "{" or dimension information.
I don't know how to interpret it because none of the columns in df are array
df.dtypes
Out[185]:
a object
b object
c int64
dtype: object
Any ideas what goes wrong there or suggestions how to maybe save df in pSQL in a simpler manner? I found quite a lot of solutions that use sqlalchemy with creating connection string in following way:
conn_string = 'postgres://user:password@host/database'
But I am not sure if that works on cloud db- if I try to edit such connection string with azure host details it does not work.