How do I read static files in a PIG UDF

Question

I am new to PIG and Hadoop. I have written a PIG UDF which operates on String and returns a string. I actually use a class from an already existing jar which contains the business logic in the udf. The class constructor takes 2 filenames as input which it uses for building some dictionary used for processing the input. How to get it working in mapreduce mode I tried passing the filenames in pig local mode it works fine. But I dont know how to make it work in mapreduce mode? Can distributed cache solve the problem?

Here is my code

REGISTER tokenParser.jar

REGISTER sampleudf.jar;


DEFINE TOKENPARSER com.yahoo.sample.ParseToken('conf/input1.txt','conf/input2.xml');

A = LOAD './inputHOP.txt' USING PigStorage() AS (tok:chararray);
B = FOREACH A GENERATE TOKENPARSER(tok);
STORE B into 'newTokout' USING PigStorage();

From what I understand is tokenParser.jar must be using some sort of BufferedInputReader. Is it possible to make it work without changing tokenParser.jar

score 1 · Accepted Answer · edited May 23 '17 at 11:52

1

Yes, like in this similar question using the distributed cache is a good way to solve this problem.

edited May 23 '17 at 11:52

Community

1
1

answered Feb 25 '11 at 21:04

Romain

7,022
3
30
30

How do I read static files in a PIG UDF

1 Answers1