2

Please help me out... I have spent a lot of hours on this.

I have files in a folder in which i wish them to be loaded according to the order of their file name.

I have even went to the extend of writing Java code to convert the file names to match the format in the guides in the following links.

  1. Load multiple files in pig
  2. Pig Latin: Load multiple files from a date range (part of the directory structure)
  3. http://netezzaadmin.wordpress.com/2013/09/25/passing-parameters-to-pig-scripts/

I am using pig 11.0

In my script.pig,

    set io.sort.mb 10;
    REGISTER 'path_to/lib/pig/piggybank.jar';

    data_ = LOAD '$input' USING org.apache.pig.piggybank.storage.XMLLoader('Data') AS (data_:chararray);
    DUMP data_;

In shell

    [root@servername currentfolder]# pig -x local script.pig -param input=/20131217/{1..10}.xml

Error returned:

    [main] ERROR.org.apache.pig.Main - ERROR 2999: Unexpected error. Undefined parameter : input
Community
  • 1
  • 1
  • try to specify "-param input=..." before "script.pig" in command line. Also try to quote input="..." if the first thing doesn't work – Ruslan Dec 17 '13 at 14:02
  • your solution worked! except for the {1..10} part. Its ok when I type {1,2,3,4,5,6,7,8,9,10} but not {1..10}, any idea? It gives ERROR 2244 now – FailedMathematician Dec 17 '13 at 14:49
  • ok according to the 2nd link i have included, it said that expansion of {1..10} is done by Linux but not hdfs api. Any suggestion or advices? – FailedMathematician Dec 17 '13 at 15:21
  • Did you try to quote as I suggested? I thought it would disable Linux expansion. Also there might be limitations to the globs of Hadoop. Check the exact glob is supported here: http://books.google.com/books?id=Wu_xeGdU4G8C&pg=PA65&lpg=PA65&dq=hadoop+file+globs&source=bl&ots=i7BTSyPfXs&sig=BjABL619LfWkU_FT7d9xDyN8yDY&hl=en&sa=X&ei=_dmwUpLGNIfk4QT36IHIBQ&ved=0CG0Q6AEwBA#v=onepage&q=hadoop%20file%20globs&f=false – Ruslan Dec 17 '13 at 23:12

1 Answers1

0

I dont know why are you using input parameters.

For example for loading every file in folder MyFolder/CurrentDate/ (in YYYYMMDD format), I am using following script:

%default DATE `date  +%Y%m%d`;
x_basic_table = LOAD '/MyFolder/$DATE';

Nice day

Radek Tomšej
  • 490
  • 4
  • 15
  • Thanks!! I am having multiple records with the same date and time and each record is stored in individual file. The only way to know the ordering is through the file name. So i need it to load the files according to the filename(digits) (the first entry must be from the file with the smallest digit). – FailedMathematician Dec 17 '13 at 15:00