I'm trying to receive data in JH from Impyla, everything works fine except tables in one DB are returning data in b'' format.
Code:
from impala.dbapi import connect
conn = connect(host=host, port=21050, user={userName}, use_ssl=True, auth_mechanism='GSSAPI', kerberos_service_name='impala', database=db)
cursor = conn.cursor()
cursor.execute(sql)
data = cursor.fetchall()
example output:
b'', b'UK', b'X', b'Hlavn\xc3\xad 51',
It is happening only on 1 DB, other DBs and tables that I have tested are ok in utf-8 (tested on 4 DBs). + Not every column is in b''.
Packages:
impyla 0.17.0 pypi_0 pypi
bitarray 2.1.0 pypi_0 pypi
six 1.14.0 py_1 conda-forge
thrift 0.11.0 pypi_0 pypi
thrift-cpp 0.13.0 h62aa4f2_2 conda-forge
thrift-sasl 0.4.3 pypi_0 pypi
thriftpy 0.3.9 py37h516909a_1001 conda-forge
thriftpy2 0.4.14 py37h5e8e339_0 conda-forge
krb5 1.17.2 h926e7f8_0 conda-forge
However, if I run same query not from JH, but directly from server the output is in correct encoding - no bytes.
Packages on server:
impyla 0.16.3 py37hc8dfbb8_0 conda-forge
bitarray 2.0.1 py37h5e8e339_0 conda-forge
thrift 0.13.0 py37hcd2ae1e_2 conda-forge
thrift_sasl 0.4.2 py37h8f50634_0 conda-forge
thriftpy 0.3.9 py37h516909a_1001 conda-forge
thriftpy2 0.4.14 py37h5e8e339_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
krb5 1.19.1 hcc1bbae_0 conda-forge
Any clues? :) Thank you.
EDIT: 07. 06. Format is in bytes because columns are varchar. String columns format = utf-8 encoded string. But varchars and chars are in bytes format. It appears that they changed it with version upgrade, as I have described behaviour server/JH (different versions). So I would have solved this by downgrading version, but the lower version is returning "invalid query handle" when trying to select a large number of rows :(
Im adding this link, which describes the issue, workaround and future progress: https://github.com/cloudera/impyla/issues/455