1

I have a Python UDF that uses lxml. My Pig job that uses the UDF fails:

File "PigParse.py", line 10, in ParseToPig ImportError: No module named lxml

The Python script works fine as a stand alone program, its line 10 is:

from lxml import etree 

Do I need to distribute lxml to the hadoop cluster somehow, and if so, how and which version should I use?

I have seen examples of distributing nltk using Hadoop -file but nothing for Pig.

TIA!!!

schoon
  • 2,858
  • 3
  • 46
  • 78

1 Answers1

0

I think my problem is because I'm using Jython:

`REGISTER 'PigParse.py' using jython as PP;

and you can't use lxml with Jython

Community
  • 1
  • 1
schoon
  • 2,858
  • 3
  • 46
  • 78