4

I am unable to use my udf on some fields, yet I can do it on others. If I use my first field, ipAddress, the udf works as intended. However, if I change it to be date I got the 1066 error. Here is my script.

Pig Script that works and calls udf.

REGISTER myudfs.jar;
DEFINE HOUR myudfs.HOUR;

A = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, date1:chararray, getRequset:chararray, location:chararray, http:chararray, code:int, port:int);
B = FOREACH A GENERATE HOUR(ip);
dump B;

Pig Script that does not work, and calls udf

REGISTER myudfs.jar;
DEFINE HOUR myudfs.HOUR;

A = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, date1:chararray, getRequset:chararray, location:chararray, http:chararray, code:int, port:int);
B = FOREACH A GENERATE HOUR(date);
dump B;

Pig script that does work, but does not call udf

REGISTER myudfs.jar;
DEFINE HOUR myudfs.HOUR;

A = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, date1:chararray, getRequset:chararray, location:chararray, http:chararray, code:int, port:int);
B = FOREACH A GENERATE date;
dump B;

Sample data

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245

Java UDF

 package myudfs;
 import java.io.IOException;
 import org.apache.pig.EvalFunc;
 import org.apache.pig.data.Tuple;
 import org.apache.pig.impl.util.WrappedIOException;

 public class HOUR extends EvalFunc<String>
 {
        @SuppressWarnings("deprecation")
        public String exec(Tuple input) throws IOException {
            if (input == null || input.size() == 0)
                return " ";
         try{
             String str = (String)input.get(0);
                return str.substring(0, 1);
            }catch(Exception e){
                throw WrappedIOException.wrap("Caught exception processing input row ", e);
            }
        }
 }

Error

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B

If there is anything else, let me know. I get this error running locally, and over map reduce.

nook
  • 2,378
  • 5
  • 34
  • 54
  • Is there any other output besides `ERROR 1066`? Anything about a backend exception? Does your script begin to execute or does it fail with this error before launching any map-reduce jobs? – reo katoa May 03 '13 at 22:48
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:33

1 Answers1

3

Could date be null some of the time? In your UDF there is a null check for the tuple but no check for input.get(0)

If this happens, it will hit your catch block and your UDF will error out. Could possibly be causing this error...

seedhead
  • 3,655
  • 4
  • 32
  • 38
  • Maybe I am wrong, but isn't that what the latter half of this `if (input == null || input.size() == 0) return " ";` if statement doing? – nook May 04 '13 at 01:33
  • No that's making sure there is an input, and that it has an item. Seedhead is saying that if input has an item, but it's null, AKA input.get(0) == null, than your function will error. – DMulligan May 04 '13 at 02:29