0

I have been using this for loading one text file

A = LOAD '1try.txt' USING PigStorage(' ') as (c1:chararray,c2:chararray,c3:chararray,c4:chararray);
Ani Menon
  • 27,209
  • 16
  • 105
  • 126
user3627159
  • 1
  • 1
  • 1
  • 1
  • possible duplicate of [Pig Latin: Load multiple files from a date range (part of the directory structure)](http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-structur) – reo katoa May 13 '14 at 11:41

3 Answers3

4

You can use folder name instead of file name, like this:

A = LOAD 'myfolder' USING PigStorage(' ') 
    AS (c1:chararray,c2:chararray,c3:chararray,c4:chararray);

Pig will load all files in the specified folder, as stated in Programming Pig:

When specifying a “file” to read from HDFS, you can specify directories. In this case, Pig will find all files under the directory you specify and use them as input for that load statement. So, if you had a directory input with two datafiles today and yesterday under it, and you specified input as your file to load, Pig will read both today and yesterday as input. If the directory you specify has other directories, files in those directories will be included as well.

Andrey Sozykin
  • 926
  • 12
  • 13
  • Thanks AndreyS. I had tried this but still i'm getting an error saying failed to read data from target directory. Any suggestions please? – user3627159 May 13 '14 at 13:32
  • It works for me. May be, you use wrong path to target directory? Do you use Pig in local or MapReduce mode? – Andrey Sozykin May 14 '14 at 07:10
  • i'm using in Pig mapreduce mode and by the way i want to know when specifying directories should i specify lik this "/user/asiapac/ssamykannu/user/asiapac/ssamykannu" or "hdfs://localhost:9100/user/asiapac/ssamykannu/user/asiapac/ssamykannu". I have three text files stored in the folder "ssamykannu" – user3627159 May 15 '14 at 07:11
  • "/user/asiapac/ssamykannu/user/asiapac/ssamykannu" is very strange path. Probably, you typed the path twice and the real path is "/user/asiapac/ssamykannu". You can check this using `hadoop fs -ls` command. If your Hadoop username is "asiapac", then you can use relative path from you home directory "ssamykannu" or full path "/user/asiapac/ssamykannu". The path "hdfs://localhost/user/asiapac/ssamykannu" will work too. – Andrey Sozykin May 15 '14 at 09:09
  • Yes there was a small confusion in this and i tried to recreate a new folder and it worked. Thanks a lot AndreyS – user3627159 May 21 '14 at 10:37
1

Here is the link to the official pig documentation that indicates that you can use the load statement to load all the files in a directory: http://pig.apache.org/docs/r0.14.0/basic.html#load

Syntax: LOAD 'data' [USING function] [AS schema];

Where: 'data': The name of the file or directory, in single quotes. If you specify a directory name, all the files in the directory are loaded.

  • hello what if the directory has many subdirectory and each subdirectory has multiple files, would the syntax also load all files? – Jeremiah Sep 10 '16 at 04:16
0
data = load '/FOLDER/PATH' using PigStorage(' ') AS (<name> <type>, ..);

OR

data = load '/FOLDER/PATH' using HBaseStorage();
Ani Menon
  • 27,209
  • 16
  • 105
  • 126