0

I am executing python code with APache Nifi ExectureStreamCommand

I read a csv which I know the encoding is latin. So I am reading my file (file stream object) with :

pd.read_csv(sys.stdin, encoding='latin')

But pandas keep throwing to me this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 172: invalid continuation byte

Thus it seems that pandas do not look at all at the given encoding parameter, and try utf-8 at any cost !

Any idea ? Thank you for you help

Pdeuxa
  • 651
  • 7
  • 27
  • These posts could be of some help: https://stackoverflow.com/q/18171739/11246056 and https://stackoverflow.com/a/61267213/11246056 – Laurent May 08 '21 at 06:33

1 Answers1

0

I finaly managed to find a solution.

I guess pandas try to open the file stream, and than consider it as a csv and apply the encoding. By default it open the file stream (sys.stdin) with utf-8. Thus I transformed sys.stdin with the following; which encode the file stream with the good encoding:

sys.stdin= io.TextIOWrapper(sys.stdin.buffer, encoding='latin')
Pdeuxa
  • 651
  • 7
  • 27