3

Using Pentaho PDI 6, with:

A) CSV Input on .csv (4 row .csv from IBM), with ASCII file encoding (preview rows works fine)

connected to

B) CPython Script Executor, installable from Tools -> MarketPlace. Assumes Python, Pandas, Numpy installed. Script settings:

Configure, Input Frames: (previous step), df
Python Script, Manual Python Script: df.replace(to_replace= "\[|\]|'|\"", value='', regex=True, inplace=True)
Output Fields, Output Fields: (column names, string type)

throws

2016/07/25 10:45:21 - CPython Script Executor.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Unexpected error
2016/07/25 10:45:21 - CPython Script Executor.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : java.lang.NullPointerException
2016/07/25 10:45:21 - CPython Script Executor.0 -  at org.pentaho.python.PythonSession.rowsToPythonDataFrame(PythonSession.java:389)
2016/07/25 10:45:21 - CPython Script Executor.0 -  at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.rowsToPyDataFrame(CPythonScriptExecutor.java:458)
2016/07/25 10:45:21 - CPython Script Executor.0 -  at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.processBatch(CPythonScriptExecutor.java:276)
2016/07/25 10:45:21 - CPython Script Executor.0 -  at org.pentaho.di.trans.steps.cpythonscriptexecutor.CPythonScriptExecutor.processRow(CPythonScriptExecutor.java:243)
2016/07/25 10:45:21 - CPython Script Executor.0 -  at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2016/07/25 10:45:21 - CPython Script Executor.0 -  at java.lang.Thread.run(Unknown Source)
2016/07/25 10:45:21 - CPython Script Executor.0 - Finished processing (I=0, O=0, R=3, W=0, U=0, E=1)

Previous debugging suggests that processRow might not be able to determine the metadata type but this error doesn't indicate this.

Question: What's the proper way to set up a scripting task to read in .csv w/o throwing NullPointerExceptions?

EDIT - The error is reproduced with the source materials as well. See: Mark Hall, Cpython Scripting and the example .zip file

EDIT 1 - python in the Command Prompt gives

C:\Users\*****>python

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:01:18) [MSC v.1900 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.

I am not running anaconda (too heavy weight) and my version of Python is .1 ahead, which might be impacting things but I would hope the plugin was python version agnostic unless the Python binary programming interface changed or something.

EDIT 2 - I can't attach the Kettle file but the example files from Mark Hall above reproduce the same issue I encountered.

Kwame
  • 752
  • 7
  • 19
  • 2
    From [Examining the code](https://github.com/pentaho-labs/pentaho-cpython-plugin/blob/master/src/org/pentaho/python/PythonSession.java#L389), it seems likely that `m_localSocket` is `null`. I can't see how that would occur if the session started without any error logs further up, as the `launchServer` method explicitly shuts down if `m_localSocket` is null. I note in the bounty text that you mention Python 3.5 - the docs for the CPython Script Executor explicitly state 3.4 (or 2.7). That could be a problem. Caveat - not a Pentaho user, just browsing and offering a suggestion. – J Richard Snape Jul 29 '16 at 15:21
  • can you attach your ktr file as an xml? I could reproduce the same error here. It will be easy if I can go through your ktr. – Marlon Abeykoon Aug 03 '16 at 04:53
  • I don't know what to tell you. For me, it just worked out of the box with latest PDI 6, latest Pandas (from Anaconda distribution, as suggested by tutorial) and latest Numpy. Everything 64 bit on Windows 7. Sorry for asking, but what's the console output if you enter `python`? – Juergen Aug 03 '16 at 13:12
  • I am also facing the same issue, output for python command on the console is `Python 3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin Type "help", "copyright", "credits" or "license" for more information.` Any suggestions ? – Sarang Manjrekar Dec 19 '16 at 15:20

0 Answers0