3

I want to load a text file into pig and then store it as rc file for this I found that twitter has provided a storage udf in this link

http://grepcode.com/file/repo1.maven.org/maven2/com.twitter.elephantbird/elephant-bird-rcfile/3.0.8/com/twitter/elephantbird/pig/store/RCFilePigStorage.java

Can someone tell me how to compile it and use it in my pig code?

Aniket Kulkarni
  • 12,825
  • 9
  • 67
  • 90

1 Answers1

0

Include all the twitter dependencies and the pig jars and compile the RCFilePigStorage.java. If you want to change some specific behavior in the code, do the changes also and can rename it to MyRCFilePigStorage.java.

Now take the class files generated after compiling and create a jar file named MyRCUdf.jar. Register this jar in your pigscript.

Register MyRCUdf.jar;
* your pig logic*
Store 'data' using MyRCFilePigStorage();

EDIT:Consider the following links for twitter dependencies. Take the source code, compile and include the classes generated in your classpath

https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/MapReduceInputFormatWrapper.java

https://github.com/kevinweil/elephant-bird

Prabha Satya
  • 98
  • 1
  • 7
  • the below import statements give error during compilation, i don't know from where exactly i can get those classes. any idea how to solve this? in fact i want to know any alternative method to use rcfiles in pig. Thanks. 1: import com.twitter.elephantbird.mapreduce.input.MapReduceInputFormatWrapper; 2: import com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat; –  Jan 22 '14 at 12:50
  • @HemantReddy I have edited the answer to reflect for your twitter dependencies – Prabha Satya Jan 22 '14 at 16:07