2

I am new to Pig and I want to use Pig to load data from a path. The path is dynamic and is stored in a txt file. Say we have a txt file called pigInputPath.txt In the pig script, I plan to do the following:

First load the path using:

InputPath = Load 'pigInputPath.txt' USING PigStorage();

Second load data from the path using:

Data = Load 'someprefix' + InputPath + 'somepostfix' USING PigStorage();

But this would not work. I also tried CONCAT but it also gives me an error. Can someone help me with this. Thanks a lot!

Rocking chief
  • 1,039
  • 3
  • 17
  • 31

1 Answers1

1

First, find a way to pass your input path as a parameter. (References: Hadoop Pig: Passing Command Line Arguments, https://wiki.apache.org/pig/ParameterSubstitution)

Lets say you invoke your script as pig -f script.pig -param inputPath=blah

You could then LOAD from that path with required prefix and postfix as follows:

Data = LOAD 'someprefix$inputPath/somepostfix' USING PigStorage();

The catch for the somepostfix string is that is needs to be separated from the parameter using a / or other such special characters to tell pig that the string is not a part of the parameter name.

One option to avoid using special characters is by doing the following:

%default prefix 'someprefix'
%default postfix 'somepostfix'
Data = LOAD '$prefix$inputPath$postfix' USING PigStorage();
Rajeev Atmakuri
  • 888
  • 1
  • 10
  • 22