Questions tagged [impyla]

Impyla is a Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.

Impyla is a Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.

Features:

  • HiveServer2 compliant; works with Impala and Hive, including nested data

  • Fully DB API 2.0 (PEP 249)-compliant Python client (similar to sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

  • Works with Kerberos, LDAP, SSL

  • SQLAlchemy connector

  • Converter to pandas DataFrame, allowing easy integration into the Python data stack (including scikit-learn and matplotlib); but see the Ibis project for a richer experience

References:

Related tags:

52 questions
6
votes
3 answers

impyla (0.14.0) ERROR - 'TSocket' object has no attribute 'isOpen'

I am getting the following error while trying to create a connection to HiveServer Traceback (most recent call last): File "/Users/user_name/Desktop/ABCo/EEM/EntityManagement/lodurr/data_lake/hive_db.py", line 56, in wrapper …
ameyrupji
  • 61
  • 1
  • 2
4
votes
0 answers

`impyla` connecting to Hive on Dataproc: impala.error.HiveServer2Error: Invalid OperationHandle: OperationHandle

I'm using impyla to connect to Hive in Dataproc. The connection is created this way conn = impala.dbapi.connect( host=host, port=10000, user=None, password=None, …
zpz
  • 354
  • 1
  • 3
  • 16
3
votes
0 answers

Impyla is returning values in bytes format

I'm trying to receive data in JH from Impyla, everything works fine except tables in one DB are returning data in b'' format. Code: from impala.dbapi import connect conn = connect(host=host, port=21050, user={userName}, use_ssl=True,…
Cappylol
  • 31
  • 5
3
votes
0 answers

Why does impyla library HiveServer2Cursor close the hive session?

I'm using the impyla package to run a series of queries with Hive on Spark from python using the impyla package's SQLAlchemy support. SQLAlchemy automatically creates and closes a dbapi cursor for each sql statement that is executed. Because the…
bgrommes
  • 65
  • 4
3
votes
1 answer

Executing Hive Scripts in Impyla

The examples I've seen for Impyla are for executing command line queries, i.e. the equivalent to running hive -e 'select * from my_db.my_table' Is there functionality in Impyla to be able to run something like : hive -f create_hive_table.hql \ …
John
  • 1,167
  • 1
  • 16
  • 33
3
votes
3 answers

impala connection via sqlalchemy

I'm new to hadoop and impala. I managed to connect to impala by installing impyla and executing the following code. This is connection by LDAP: from impala.dbapi import connect from impala.util import as_pandas conn =…
okyere
  • 171
  • 1
  • 3
  • 16
3
votes
2 answers

Getting detailed Impyla error message

When I execute a SQL statement in Impala using Python/Impyla, I am just getting an exception with a generic error message like ""Operation is in ERROR_STATE". How do I get more detailed information about the error that occurred?
aaa90210
  • 11,295
  • 13
  • 51
  • 88
2
votes
1 answer

AWS Lambda Error: Unable to import module 'function_name': No module named 'module._module'

Please see the screenshots in particular after reading. I am deploying a python script on AWS Lambda which uses the package impyla which has a dependency on the package bitarray. from impala.dbapi import connect My python file is called…
Humza Khan
  • 733
  • 1
  • 6
  • 13
2
votes
1 answer

How to Impersonate Impala queries on Superset

I'm setting up Superset (0.36.0) in production Mode (with Gunicorn), and I would like to set up impersonate while running Impala queries on my Kerberized Cluster, to each user of Superset have privilegies on tables/databases like he has on…
guilherme0170
  • 123
  • 1
  • 9
2
votes
1 answer

python - unable to connect to TLS1.2 enabled HiveServer2

I have HiveServer2 with SSL (minimum TLS1.2 enabled only) and LDAP enabled, no kerberos enabled. hive.server2.transport.mode = binary. Beeline connections work fine like: beeline -u…
tooptoop4
  • 234
  • 3
  • 15
  • 45
2
votes
2 answers

Impyla Insert SQL from Flask: Syntax error (Identifier Binding)

Recently I set up a Flask POST endpoint to write data into Impala DB via the Impyla module. Env: Python 3.6.5 on CentOS. Impala version: impalad version 2.6.0-cdh5.8.0 api.py: from flask import Flask, request, abort, Response from flask_cors import…
suvtfopw
  • 928
  • 10
  • 18
2
votes
1 answer

Impala median calculation on big data

I've got access to some data that's hundreds of millions of rows for any given month. 3 features : a string representing a date, a string representing a type and a value representing an amount. Having access to python and impala(SQL), what's the…
Rob
  • 153
  • 12
2
votes
2 answers

impyla - as_pandas - empty dataframe

I have a simple impyla code, and I would like to create a pandas dataFrame from my cursor. My code is running but my dataframe is always an empty dataframe. If I run my query directly on impala, the result is not empty. This is how my code looks…
solarenqu
  • 804
  • 4
  • 19
  • 44
2
votes
1 answer

How to pass parameters in hive query when executing using impyla?

I referred to How to use variables in SQL statement in Python? but couldn't get an answer. I am trying what you suggested, but I get this error - :( tbl_nm = 'EMPLOYEE_TABLE' con.execute('select max(emp_id) from schema.?', tbl_nm) Getting below…
rajb2r
  • 31
  • 2
2
votes
0 answers

File not found error while testing python impyla

I am trying to set up a connection between python and impala. Based on the instructions here I am trying to set up impyla. I am on a vagrant ubuntu/xenial64 box with python 2.7.12. After reading about some issues with the latest thrift I downgraded…
Blueraaga
  • 93
  • 1
  • 7
1
2 3 4