2

I'm trying to run python code in Nifi ExecuteStreamCommand processor.

The code includes non pure python modules like Pandas and Numpy so to use Nifi executeScript is not an option.

The issue is around reading in flow file and modifying flow file content.

Apparently it is possible to read incoming flow file with STDIN and to write out with STDOUT, see this SO question: Python Script using ExecuteStreamCommand

But I have not been able to get this working.

1. Tried simply reading in a CSV from STDIN and modifying it, but when sent to putFile processor the file is the same.

import sys
import pandas as pd
import io

df = pd.read_csv(io.StringIO(sys.stdin.read(1)))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df2 = df.append(df2)

2. Tried wrapping some other code in a function and returning in assumption that function output would go to STDOUT, but same outcome.

def convert_csv_dataframe():
    a = pd.read_csv(io.StringIO(sys.stdin.read(1)))
    a.replace(["ABC", "AB"], "A", inplace=True)
    return a

convert_csv_dataframe()

If anybody can help it would be most appreciated.

EDIT:

This code works. The issue was in Nifi. I was reading from "original" relationship instead of "output flow" relationship. Note that stdin is reading one line but don't think that should make a difference. The only question I have is: Can I reference a flow file itself (not it's contents) from executeStreamCommand ?

import sys

a = sys.stdin.readline()
a = a.upper()
sys.stdout.write(a)
gary
  • 425
  • 5
  • 20
  • start from something simple. just print hello world in your python. call it with ExecuteStreamCommand and you have to see this hello world in nifi flowfile. – daggett Feb 19 '19 at 10:06

1 Answers1

1

I think you need to write to STDOUT somewhere in your script. I don't know much Python, but both examples look like you read from STDIN and then modify data in memory, but never write it back out.

Bryan Bende
  • 18,320
  • 1
  • 28
  • 39
  • Thanks @Bryan Bende, I had originally pushed to STDOUT but it did not work. I then realized that I was picking up "original" relationship instead of "output stream" relationship. And now I can modify the content of a flow file and write out to local storage. The only question I have is this: How can I reference a flow file itself (not it's content) in ExecuteStreamCommand ? – gary Feb 20 '19 at 08:56
  • @gary I'm not an expert on the scripting processors, but I don't think ExecuteStreamCommand can reference a flow file itself, its just a wrapper that allows you to access the content. I think you'd need to use ExecuteScript which would give you access to the session more like writing a regular processor – Bryan Bende Feb 20 '19 at 14:16
  • I realize you said you can't use ExecuteScript, but I'm not sure why, you should be able to specify additional modules in the property "Module Directory" – Bryan Bende Feb 20 '19 at 14:17
  • Thanks for the feedback Bryan. I need to access the file itself so ExecuteStreamCommand is a no-go. I tried unsuccessfully to use executeScript with pandas and numpy in the module dir but ran into issues. – gary Feb 21 '19 at 09:29
  • Further reading suggests that (modules dependent on) c-native libs won't run in executeScript (Jython). https://stackoverflow.com/questions/40744911/import-error-no-module-named-constant-time-while-accessing-server http://apache-nifi-users-list.2361937.n4.nabble.com/Is-it-possible-to-reference-python-requests-module-in-ExecuteScript-td1827.html https://stackoverflow.com/questions/19455100/can-i-run-numpy-and-pandas-with-jython I have to conclude that pandas/numpy code simply will not work inside Nifi :( – gary Feb 21 '19 at 09:29