0

I would like to know how to load some files from a directory in Pig Script .

Let's say there are 4 files in a directory for JAN month and those 4 file names are as below

 2016-01-01.txt
 2016-01-02.txt
 2016-01-03.txt
 2016-01-04.txt

Now my requirement is to read files from 2016-01-01 to 2016-01-03, that means taking first 3 files of JAN 2016 ..

My Pig script :

This below line works:

rec = LOAD '/home/dir/{2016-01-01*,2016-01-02*,2016-01-03*}' USING PigStorage(',');

This below line does not work :

rec = LOAD '/home/dir/{2016-01-{01*-03*}}' USING PigStorage(',');

I am getting the below error. I am using Pig 0.14 in MAPR Cluster

N/A     file_records    MAP_ONLY        Message:     org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input Pattern maprfs:///home/dir/{2016-01-{01*-03*}} matches 0 files. Paths with components .*, _* were skipped. 
0 additional path filters were applied

Could some body explain me what happened and how do I resolve this ?

Surender Raja
  • 3,553
  • 8
  • 44
  • 80

1 Answers1

1

Possible duplicate Load mutilple files over a date range in PIG

rec = LOAD '/home/dir/{2016-01-0{1,2,3}*}' USING PigStorage(',');

or

rec = LOAD '/home/dir/{2016-01-{01,02,03}*}' USING PigStorage(',');

or

rec = LOAD '/home/dir/{2016-01-0[1-3]*}' USING PigStorage(',');
Community
  • 1
  • 1
nobody
  • 10,892
  • 8
  • 45
  • 63