0

I got output file(stored on HDFS) from map reduce program. now I am trying to load that file using PIG 0.7.0.

I am getting following error. I have tried copying this file to local machine and ran pig in local mode which works fine. but I want to skip this step and make it work from map reduce mode.

options I tried:

LOAD 'file://log/part-00000', 
LOAD '/log/part-00000', 
LOAD 'hdfs:/log/part-00000', 
LOAD 'hdfs://localhost:50070/log/part-00000', 

hadoop dfs -ls /log/
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r--   3  supergroup          0 2014-02-07 07:56 /log/_SUCCESS
drwxr-xr-x   -  supergroup          0 2014-02-07 07:55 /log/_logs
-rw-r--r--   3  supergroup      10021 2014-02-07 07:56 /log/part-00000

pig (running in mapreduce mode)

grunt> REC = LOAD 'file://log/part-00000' as (CREATE_TMSTP:chararray,         MESSAGE_TYPE:chararray, MESSAGE_FROM:chararray, MESSAGE_TEXT:chararray);
grunt> DUMP REC;

Backend error message during job submission
-------------------------------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: file:///log/part-00000
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:695)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/log/part-00000
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)
    ... 7 more

Pig Stack Trace

ERROR 2997: Unable to recreate exception from backend error:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: file:///log/part-00000

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias REC
    at org.apache.pig.PigServer.openIterator(PigServer.java:521)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
    at org.apache.pig.Main.main(Main.java:357)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: file:///log/part-00000
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:268)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
    at org.apache.pig.PigServer.store(PigServer.java:569)
    at org.apache.pig.PigServer.openIterator(PigServer.java:504)

... 6 more

Frederic
  • 3,274
  • 1
  • 21
  • 37
Ronak Patel
  • 3,819
  • 1
  • 16
  • 29
  • how about `REC = LOAD '/log' as (CREATE_TMSTP:chararray,MESSAGE_TYPE:chararray, MESSAGE_FROM:chararray, MESSAGE_TEXT:chararray)` – frail Feb 07 '14 at 16:26
  • Thank You for quick reply. Same error…I'll try with upgraded pig 0.12.0 and get back to you all with my findings. Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/log – Ronak Patel Feb 07 '14 at 17:15
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:01

1 Answers1

1

You should try upgrading to a more recent version of Pig. 0.7.0 is quite a few years old. 0.12.0 is the current stable release.

wdwtw
  • 26
  • 2
  • Thank You for quick reply...I'll try with upgraded pig 0.12.0 and get back to you all with my findings. – Ronak Patel Feb 07 '14 at 17:18
  • Thank you, your trick worked. Pig 0.7 only works on Hadoop 0.20. I tried with Pig 0.12.0, and it worked! :) --> REC = LOAD 'hdfs:/log/part-00000' … [PigWithHadoop](http://pig.apache.org/releases.html#13+May%2C+2010%3A+release+0.7.0+available) – Ronak Patel Feb 07 '14 at 23:13
  • I'm glad the newer version is working for you. Most of the Hadoop ecosystem is pretty version specific. If you have the luxury of being able to run one of the distributions ([Apache](http://bigtop.apache.org/), [Cloudera](http://www.cloudera.com/content/support/en/downloads.html), [HortonWorks](http://hortonworks.com/products/download-archives/)) you will be saved the trouble of making sure all of your tools are version compatible. – wdwtw Feb 11 '14 at 22:12