I have an issue where some of my flow files in NiFi contain invalid UTF-8 encoding. When I attempt to read my flow file into a Python script using the ExecuteStreamCommand
it produces an error at sys.stdin.read()
because that assumes UTF-8 encoding. I can't figure out how to coerce the encoding to be UTF-8 when I read it in--I figured out that I can use string.encode('utf-8').strip().decode('utf-8')
with a string but I'm not quite sure how to apply this to stdin
.
I attempted
#!/usr/bin/python3
import json
import re
import sys
import io
try:
flow_file = sys.stdin.read()
sys.stdout.write(str(flow_file))
except UnicodeDecodeError as e:
flow_file = sys.stdin.encode('utf-8').strip().decode('utf-8')
sys.stdout.write(str(flow_file))
but it failed
File "/path/to/scripts/test_encoding.py", line 12, in <module>
flow_file = sys.stdin.encode('utf-8').strip().decode('utf-8')
AttributeError: '_io.TextIOWrapper' object has no attribute 'encode'