7

I'm attempting to get Apache Pig up and running on my Hadoop cluster, and am encountering a permissions problem. Pig itself is launching and connecting to the cluster just fine- from within the Pig shell, I can ls through and around my HDFS directories. However, when I try and actually load data and run Pig commands, I run into permissions-related errors:

grunt> A = load 'all_annotated.txt' USING PigStorage() AS (id:long, text:chararray, lang:chararray);
grunt> DUMP A;
2011-08-24 18:11:40,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=steven, access=WRITE, inode="":hadoop:supergroup:r-xr-xr-x
2011-08-24 18:11:40,977 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A
Details at logfile: /Users/steven/Desktop/Hacking/hadoop/pig/pig-0.9.0/pig_1314230681326.log
grunt> 

In this case, all_annotated.txt is a file in my HDFS home directory that I created, and most definitely have permissions to; the same problem occurs no matter what file I try to load. However, I don't think that's the problem, as the error itself indicates Pig is trying to write somewhere. Googling around, I found a few mailing list posts suggesting that certain Pig Latin statements (order, etc.) need write access to a temporary directory on the HDFS file system whose location is controlled by the hadoop.tmp.dir property in hdfsd-site.xml. I don't think load falls into that category, but just to be sure, I changed hadoop.tmp.dir to point to a directory within my HDFS home directory, and the problem persisted.

So, anybody out there have any ideas as to what might be going on?

Steven Bedrick
  • 663
  • 2
  • 8
  • 16
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 14:49

2 Answers2

13

Probably your pig.temp.dir setting. It defaults to /tmp on hdfs. Pig will write temporary result there. If you don't have permission to /tmp, Pig will complain. Try to override it by -Dpig.temp.dir.

Daniel Dai
  • 294
  • 1
  • 4
0

A problem might be that hadoop.tmp.dir is a directory on your local filesystem, not HDFS. Try setting that property to a local directory you know you have write access to. I've run into the same error using regular MapReduce in Hadoop.

Matt D
  • 3,055
  • 1
  • 18
  • 17
  • Huh. Well, in that case, the error makes even less sense. I definitely have write access to /tmp on my local filesystem. Just to be sure, I changed it back, and the problem still occurs. I really think that whatever's going on is HDFS-related somehow. Thanks for the suggestion, though... – Steven Bedrick Aug 25 '11 at 18:34
  • 2
    `inode="":hadoop:supergroup:r-xr-xr-x` means that the user `hadoop` is trying to write to the HDFS directory `/`. Try `hadoop fs -chmod 755 /`, which will add write permissions to the `hadoop` user. You may need to use `775` if you are not executing as hadoop, but are in the `supergroup` group. – Matt D Aug 25 '11 at 18:53
  • Thanks for the reply! I don't actually have permissions to "/"; I'm not the administrator of the cluster I'm using, so I don't think I'll be able to chmod anything at that level of the file system. Do you happen to know why Pig would be trying to write to the HDFS root? – Steven Bedrick Aug 26 '11 at 16:44
  • 1
    As per Daniel's answer, it looks like it was trying to create the directory `/tmp` in HDFS, thus it needed to write to `/` to create that directory. – Matt D Aug 26 '11 at 16:48