3

I'm trying to receive data in JH from Impyla, everything works fine except tables in one DB are returning data in b'' format.

Code:

from impala.dbapi import connect

conn = connect(host=host, port=21050, user={userName}, use_ssl=True, auth_mechanism='GSSAPI', kerberos_service_name='impala', database=db)
cursor = conn.cursor()
cursor.execute(sql)
data = cursor.fetchall()

example output:

b'', b'UK', b'X', b'Hlavn\xc3\xad 51',

It is happening only on 1 DB, other DBs and tables that I have tested are ok in utf-8 (tested on 4 DBs). + Not every column is in b''.

Packages:

impyla 0.17.0 pypi_0 pypi
bitarray 2.1.0 pypi_0 pypi
six 1.14.0 py_1 conda-forge
thrift 0.11.0 pypi_0 pypi
thrift-cpp 0.13.0 h62aa4f2_2 conda-forge
thrift-sasl 0.4.3 pypi_0 pypi
thriftpy 0.3.9 py37h516909a_1001 conda-forge
thriftpy2 0.4.14 py37h5e8e339_0 conda-forge
krb5 1.17.2 h926e7f8_0 conda-forge

However, if I run same query not from JH, but directly from server the output is in correct encoding - no bytes.

Packages on server:

impyla 0.16.3 py37hc8dfbb8_0 conda-forge
bitarray 2.0.1 py37h5e8e339_0 conda-forge
thrift 0.13.0 py37hcd2ae1e_2 conda-forge
thrift_sasl 0.4.2 py37h8f50634_0 conda-forge
thriftpy 0.3.9 py37h516909a_1001 conda-forge
thriftpy2 0.4.14 py37h5e8e339_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
krb5 1.19.1 hcc1bbae_0 conda-forge

Any clues? :) Thank you.


EDIT: 07. 06. Format is in bytes because columns are varchar. String columns format = utf-8 encoded string. But varchars and chars are in bytes format. It appears that they changed it with version upgrade, as I have described behaviour server/JH (different versions). So I would have solved this by downgrading version, but the lower version is returning "invalid query handle" when trying to select a large number of rows :(

Im adding this link, which describes the issue, workaround and future progress: https://github.com/cloudera/impyla/issues/455

Cappylol
  • 31
  • 5

0 Answers0