I need to replace a table in Hive with a new pandas dataframe. I am using pyhive to create a connection engine and subsequently using pandas.to_sql with 'if_exists' as replace.
from pyhive import hive
my_data = pd.read_csv('my_data.csv')
conn = hive.Connection(host="111.11.11.11", port=10000, username="abcd")
my_data.to_sql(name='table_name', con=conn, if_exists='replace', schema='my_schema')
conn.close()
However, this results in an unexpected error as follows:
DatabaseError: Execution failed on sql: SELECT name FROM sqlite_master WHERE type='table' AND name=?;
not all arguments converted during string formatting
unable to rollback
Other answers seem to indicate that this is related to to_sql expecting a SqlAlchemy engine - I was under the impression that this is what pyhive uses to create a connection. Any guidance on how to appropriately execute this command would be appreciated.
It is worth noting that elsewhere, this same connection has no problem reading data from Hive.
Works as expected:
conn = hive.Connection(host="111.11.11.11", port=10000, username="abcd")
my_data = pd.read_sql('select * from my_table', conn)
conn.close()