Is it possible to execute multiple queries at the same time in impala ? If yes, how does impala handle it?
4 Answers
I would certainly do some tests on your own, but I was not able to get multiple queries to execute: I was using Impala connection, and reading query from a .sql file. This works for single commands.
from impala.dbapi import connect
# actual server and port changed for this post for security
conn=connect(host='impala server', port=11111,auth_mechanism="GSSAPI")
cursor = conn.cursor()
cursor.execute((open("sandbox/z_temp.sql").read()))
This is the error I received.
HiveServer2Error: AnalysisException: Syntax error in line 2:
This is what the SQL looked like in the .sql file.
Select * FROM database1.table1;
Select * FROM database1.table2;
I was able to run multiple commands with the SQL commands in separate .sql files iterating over all .sql files in a specified folder.
#Create list of file names for recon .sql files this will be sorted
#Numbers at begining of filename are important to sort so that files will be executed in correct order
file_names = glob.glob('folder/.sql')
asc_names = sorted(file_names, reverse = False)
filename = ""
for file_name in asc_names:
str_filename = str(file_name)
print(filename)
query = (open(str_filename).read())
cursor = conn.cursor()
# creates an error log dataframe to print, or write to file at end of job.
try:
# Each SQL command must be executed seperately
cursor.execute(query)
df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'PASS'}])
df_log = df_log.append(df_id, ignore_index=True)
except:
df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'FAIL'}])
df_log = df_log.append(df_id, ignore_index=True)
continue
Another way to do this would be to have all of the SQL statements in one .sql file separated by ; then loop thru the .sql file splitting statements out by ; running one at a time.
from impala.dbapi import connect
from impala.util import as_pandas
conn=connect(host='impalaserver', port=11111, auth_mechanism='GSSAPI')
cursor = conn.cursor()
# split SQL statements from one file seperated by ';', Note: last command will not have semicolon at end.
sql_file = open("sandbox/temp.sql").read()
sql = sql_file.split(';')
for cmd in sql:
# This gets rid of the non printing characters you may have
cmd = cmd.replace('/r','')
cmd = cmd.replace('/n','')
# This runs your SQL commands one at a time.
cursor.execute(cmd)
print(cmd)

- 51
- 9
Impala can execute multiple queries at the same time as long as it doesn't hit the memory cap.

- 2,511
- 4
- 20
- 20
You can issue a command like impala-shell -f <<file_name>>
, where the file has multiple queries each complete query separated by a semi colon (;)

- 1,541
- 1
- 12
- 21
If you are a python geek, you can even try the impyla package to create multiple connections and run all your queries at once.
pip install impyla

- 310
- 1
- 3
- 11
-
1I don't think impyla can execute multiple statement in same execute call. It uses dbapi standards which does not allow multiple statement in single execute. Please let me know if you are able to execute this with impyla. – Kishan Pandey Nov 15 '19 at 06:59