The issue is that you are calling decode
on the flowfileList object, not the individual flowfiles.
In addition, you’ll need to actually access the flowfile content and then set the content with the new encoding. Right now you are treating the flowfile object as if it is a string, but it is not. I’m away from my computer but will have working example code later.
Update
I will provide working Python code to demonstrate this, but why can't you just use the ConvertCharacterSet
processor? This accepts an input character set and output character set.
Here is working code which will convert incoming flowfile content from UTF-16 to UTF-8. You should try to filter already existing UTF-8 content to skip this processor, or add code to identify it and no-op process it. You may also be interested in following NIFI-4550 - Add InferCharacterSet processor for the same behavior.
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
def __init__(self):
pass
def process(self, inputStream, outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_16)
outputStream.write(bytearray(text.encode('utf-8')))
# end class
flowFileList = session.get(100)
if not flowFileList.isEmpty():
for flowFile in flowFileList:
flowFile = session.write(flowFile, PyStreamCallback())
flowFile = session.putAttribute(flowFile, 'script_character_set', 'UTF-8')
session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end