1

I need to replace a table in Hive with a new pandas dataframe. I am using pyhive to create a connection engine and subsequently using pandas.to_sql with 'if_exists' as replace.

from pyhive import hive

my_data = pd.read_csv('my_data.csv')

conn = hive.Connection(host="111.11.11.11", port=10000, username="abcd")

my_data.to_sql(name='table_name', con=conn, if_exists='replace', schema='my_schema')

conn.close()

However, this results in an unexpected error as follows:

DatabaseError: Execution failed on sql: SELECT name FROM sqlite_master WHERE type='table' AND name=?;
not all arguments converted during string formatting
unable to rollback

Other answers seem to indicate that this is related to to_sql expecting a SqlAlchemy engine - I was under the impression that this is what pyhive uses to create a connection. Any guidance on how to appropriately execute this command would be appreciated.

It is worth noting that elsewhere, this same connection has no problem reading data from Hive.

Works as expected:

conn = hive.Connection(host="111.11.11.11", port=10000, username="abcd")

my_data = pd.read_sql('select * from my_table', conn)

conn.close()
sjs
  • 13
  • 3

1 Answers1

1

Other answers seem to indicate that this is related to to_sql expecting a SqlAlchemy engine - I was under the impression that this is what pyhive uses to create a connection.

PyHive can create a SQLAlchemy Engine object, but not the way you're doing it. As illustrated in the PyHive docs, you need to do something like

engine = create_engine('hive://localhost:10000/default')

and then pass the engine object to to_sql.

[read_sql] Works as expected

read_sql will often work with just a DBAPI connection, but to_sql requires a SQLAlchemy Connectable (Engine or Connection) because it may need to generate DDL. See this answer for more information.

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418