Pig is not running in mapreduce mood (hadoop 3.1.1 + pig 0.17.0)

Question

I am very new to Hadoop. My hadoop version is 3.1.1 and pig version is 0.17.0.

Everything is working as expected by running this script in local mode

pig -x local

grunt> student = LOAD '/home/ubuntu/sharif_data/student.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
grunt> DUMP student;

Result for local mode

But for the same input file and pig script, mapreduce mode is not working successfully.

pig -x mapreduce

grunt> student = LOAD '/pig_data/student.txt' USING PigStorage(',') AS ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
grunt> STORE student INTO '/pig_data/student_out' USING PigStorage (',');

OR

grunt> student = LOAD 'hdfs://NND1:9000/pig_data/student.txt' USING PigStorage(',') AS ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
grunt> STORE student INTO 'hdfs://NND1:9000/pig_data/student_out' USING PigStorage (',');

Result for mapreduce mode OR Note: student.txt is uploaded to HDFS successfully.

hdfs dfs -ls  /pig_data 
Found 2 items
-rw-r--r--   3 ubuntu supergroup     861585 2019-07-12 00:55 /pig_data/en.sahih.txt
-rw-r--r--   3 ubuntu supergroup        234 2019-07-12 12:25 /pig_data/student.txt

Even under grunt this command returns correct HDFS file name.

grunt> fs -cat /pig_data/student.txt

Why is it saying failed to read data when the file exists in that path?
What could be the possible reasons that I am missing?

Any help is appreciated.

Are you sure it's not checking locally? You haven't specified HDFS in the URL. Notice how the full HDFS URL is used here https://www.tutorialspoint.com/apache_pig/apache_pig_grunt_shell — Ben Watson, Jul 12 '19 at 08:57
Why would I need hdfs path as pig can identify it using fs? FYI i have tried with hdfs path too :( — sharif2008, Jul 12 '19 at 09:21
Does the job run if you point into to the same file stored locally? — Ben Watson, Jul 12 '19 at 09:32
yeah. In local mode, my local files are run successfully. Only problem that it failed in mapreduce mode with errors like this: *Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.* — sharif2008, Jul 12 '19 at 09:42

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

4

Part of the problem is that Pig 0.17 doesn't support Hadoop 3 yet.

The Apache Pig Releases states for 0.17:

19 June, 2017: release 0.17.0 available

The highlights of this release is the introduction of Pig on Spark

Note: This release works with Hadoop 2.X (above 2.7.x)

And JIRA PIG-5253 - Pig Hadoop 3 support is still in progress.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 12 '19 at 13:58

tk421

5,775
6
23
34

Thanks for your reference :) . I am confused because it was working in local mode. – sharif2008 Jul 12 '19 at 15:56
2

Yeah, local mode just uses the underlying OS's filesystem. – tk421 Jul 12 '19 at 16:08
1

Same for hive 3.1.1, it does support hadoop 3.1.1 either. . https://issues.apache.org/jira/browse/HIVE-20022 (just pasting the link here as reference if someone is benefited) – sharif2008 Jul 12 '19 at 16:21

Pig is not running in mapreduce mood (hadoop 3.1.1 + pig 0.17.0)

1 Answers1