How do I dump my data?

Question

I've installed PIG and I'm loading a csv by doing the following:

grunt> boys = LOAD '/user/pig_input/student-boys.txt' USING PigStorage ('\t') AS (name:chararray,state:chararray,attendance:float);

2014-07-27 11:54:04,414 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-27 11:54:04,414 [main] WARN  org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-07-27 11:54:04,525 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-27 11:54:04,525 [main] WARN  org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

However, when I try to dump the dataset I get the below errors:

grunt> DUMP boys;
2014-07-27 11:54:12,710 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias boys
Details at logfile: /home/hduser/tmp/pig_1406476412229.log

I tried looking into why this was happening but I'm having trouble understanding the exact reason.

grunt> hduser@hadoop:~/tmp$ cat /home/hduser/tmp/pig_1406476412229.log
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias boys

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias boys
    at org.apache.pig.PigServer.openIterator(PigServer.java:880)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:541)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1464)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2175)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2127)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:988)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:349)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1482)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1478)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1476)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
    at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
    at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1258)
    at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:504)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1505)
    at org.apache.pig.backend.hadoop.datastorage.HDirectory.create(HDirectory.java:63)
    at org.apache.pig.backend.hadoop.datastorage.HPath.create(HPath.java:159)
    at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:481)
    at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:474)
    at org.apache.pig.PigServer.openIterator(PigServer.java:855)
    ... 12 more
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1464)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2175)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2127)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:988)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:349)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1482)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1478)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1476)

    at org.apache.hadoop.ipc.Client.call(Client.java:1028)
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
    at com.sun.proxy.$Proxy0.mkdirs(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:84)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at com.sun.proxy.$Proxy0.mkdirs(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1256)
    ... 19 more

I need help understanding why I'm getting these errors? What do I need to change to properly dump the dataset that I loaded?

I installed PIG and try to load a CSV file and dump the data...HOwever when i try to dump the Dataset i get the below errors: 1.org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias boys 2.Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp.....PLease help to reoslve the same ...Thanks ! — Meenakshi-sundaram Meenakshi S, Jul 27 '14 at 16:45
For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). — Dennis Jaheruddin, Dec 28 '15 at 14:37

score 0 · Answer 1 · answered Jul 27 '14 at 20:31

0

/user/pig_input/student-boys.txt.

Can you try creating a file with the following lines

boys = LOAD '/user/pig_input/student-boys.txt' USING PigStorage ('\t') AS (name:chararray,state:chararray,attendance:float);

DUMP boys;

Then try to run execute it using pig filename.pig.

Generally grunt mode runs in local mode.

Thanks,

Regards, Dheeraj Rampally.

answered Jul 27 '14 at 20:31

Dheeraj R

701
9
17

Thanks Dheeraj, As per your suggestion i tried to create a file with .pig extension an tried to run in local mode as pig -x local filename.pig...however im getting the below error now 2014-07-29 06:44:29,175 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2014-07-29 06:44:29,179 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache; – Meenakshi-sundaram Meenakshi S Jul 29 '14 at 10:45
Hi,Is you run it in local mode then you should be having the local path. I mean the path present on that host. – Dheeraj R Jul 30 '14 at 23:46

score 0 · Answer 2 · answered Jul 28 '14 at 00:13

0

How did you start the grunt shell? It seems you are in mapreduce mode and the file(s) you want to load are on the local file system. Either start the grunt shell with pig -x local and load the file from the local system or copy the file(s) you want to load to HDFS and start the shell with pig

answered Jul 28 '14 at 00:13

user3810043

166
1
4

yes tried both ways and currently when i run in local mode , im getting the error as 2014-07-29 06:44:29,175 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2014-07-29 06:44:29,179 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; – Meenakshi-sundaram Meenakshi S Jul 29 '14 at 10:48

How do I dump my data?

2 Answers2