0

I have an issue where some of my flow files in NiFi contain invalid UTF-8 encoding. When I attempt to read my flow file into a Python script using the ExecuteStreamCommand it produces an error at sys.stdin.read() because that assumes UTF-8 encoding. I can't figure out how to coerce the encoding to be UTF-8 when I read it in--I figured out that I can use string.encode('utf-8').strip().decode('utf-8') with a string but I'm not quite sure how to apply this to stdin.

I attempted

#!/usr/bin/python3

import json
import re
import sys
import io

try:
    flow_file = sys.stdin.read()
    sys.stdout.write(str(flow_file))
except UnicodeDecodeError as e:
    flow_file = sys.stdin.encode('utf-8').strip().decode('utf-8')
    sys.stdout.write(str(flow_file))

but it failed

  File "/path/to/scripts/test_encoding.py", line 12, in <module>
    flow_file = sys.stdin.encode('utf-8').strip().decode('utf-8')
AttributeError: '_io.TextIOWrapper' object has no attribute 'encode'
carousallie
  • 776
  • 1
  • 7
  • 25
  • https://stackoverflow.com/questions/16549332/python-3-how-to-specify-stdin-encoding – daggett Mar 17 '20 at 19:44
  • Does this answer your question? [Python 3: How to specify stdin encoding](https://stackoverflow.com/questions/16549332/python-3-how-to-specify-stdin-encoding) – daggett Mar 17 '20 at 19:44
  • @daggett I did see that and, unfortunately, I'm not totally clear on how it could work. I tried `sys.stdin.reconfigure(encoding='utf-8')` and it said `AttributeError: '_io.TextIOWrapper' object has no attribute 'reconfigure'`. As for using `io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')` I'm not totally clear on how to coerce it to be utf-8 and write out a flow file with proper utf-8 encoding. – carousallie Mar 17 '20 at 20:08

0 Answers0