3

While trying to decode the values from HBase, i am seeing an error but it is apparent that Python thinks it is not in UTF-8 format but the Java application that put the data into HBase encoded it in UTF-8 only

a = '\x00\x00\x00\x00\x10j\x00\x00\x07\xe8\x02Y' a.decode("UTF-8") Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 9: invalid continuation byte

any thoughts?

  • this is some kind of bytes representation. you should know the original type of data to decode it. looking for solution my self. – Naomi Fridman May 22 '18 at 21:13

1 Answers1

0

that data is not valid utf-8, so if you really retrieved it as such from the database, you should check who/what put it in there.

wouter bolsterlee
  • 3,879
  • 22
  • 30