I am working on a machine learning project with big data from a hadoop based data warehouse. Therefore, I am using lots of different SQL statements during the day. Sometimes I am modifying the data according to some features and afterwards, I need to feed those variables of dataframe into SQL statement to collect other results from database.
Until now, I've done research on the web but in all related answers you have to indicate each variable as ?
or %s
in the query. But that method is not feasible for me as sometimes I need to feed more than hundreds of variables so I can not be like
select * from table where id in (?,?,?,.....,?)
My construction to run SQL queries is as follows:
from impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host='0.0.0.0', port=0, auth_mechanism="blank")
impala_cursor = conn.cursor()
And to run the query
sql= '''SELECT name, id
FROM table
WHERE id in (---HERE COMES THE VARIABLES FROM PANDAS DATAFRAME---)
'''
impala_cursor.execute(sql)
df_var = as_pandas(impala_cursor)
Any help is welcome. Thanks in advance!