1

I am new to Pig My input data is

(message,NIL,2015-07-01,22:58:53.66,E,machine.com.name,12,0xd6,String,String ,0,0.0,key=value&key=123456789&key=value&key=US&key=COMPANY&key=MESSAGE&key=123456789&key=String&key=String&Key=String&Key=String)

I have written Java UDF as below to parse last string of input data

package com.pig.udf;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class PigUDF extends EvalFunc<Map> {


    @Override
    public Map<String, String> exec(Tuple input) throws IOException {
        // If tuple is null, has fewer than 3 values, or has an even number of
        // values
        if (input == null || input.size() < 3 || (input.size() % 2 == 0)) {
            throw new IOException("Incorrect number of values.");
        }

        String source = (String) input.get(0);
        System.out.println("input Source"+source);
        String delim = (input.size() > 1) ? (String) input.get(1) : "&";
        int length = (input.size() > 2) ? (Integer) input.get(2) : 0;
        if (source == null || delim == null) {
            return null;
        }

        String[] splits = source.split(delim, length);
        System.out.println("Splits"+ splits);
        ArrayList<String> arrayList = new ArrayList<String>(
                Arrays.asList(splits));
        Map<String, String> map = new HashMap<String, String>();
        for (String keyValue : arrayList) {
            int end = keyValue.indexOf('=');
            if (end != -1) {
                map.put(keyValue.substring(0, end), keyValue.substring(end + 1));
            }

        }
        System.out.println("map"+map);

        return map;

    }

} 

When I am running the my Pig Script with above Java UDF I am getting below error

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias C

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C
    at org.apache.pig.PigServer.openIterator(PigServer.java:892)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:607)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
    at org.apache.pig.PigServer.openIterator(PigServer.java:884)
    ... 13 more



    Application Log
    -------------------------------------------------------------------
    Application application_1436453941326_0020 failed 2 times due to AM Container for appattempt_1436453941326_0020_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1436453941326_0020/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1436453941326_0020_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application. 

My Script is running fine without Java UDF function and giving me outfile too. The issue arises when I include Java UDF in my Pig Script. There is no java version mismatch between my Java UDF and machine running Pig Any pointers will be appreciated

Pig Script :

Register '/home/cloudera/Pig/PigUDF_1.7.jar';
Register '/home/cloudera/Pig/pig.jar';
 A= Load 'Logs_message.txt' using PigStorage(',') as (component:chararray,Nil:chararray,date:chararray,time:chararray,E:chararray,machine_address:chararray,number1:chararray,hex_number:chararray,cal_type:chararray,cal_name:chararray,number2:chararray,number3:chararray,data:chararray) 
 B = filter A by cal_name matches 'CHANGEDMESSAGE';
 C = foreach B generate cal_name ,com.pig.udf.PigUDF(data) as dataMap;
 dump C ;
Divya
  • 95
  • 1
  • 9
  • how are you calling the udf? also, look for more detailed logs. – Frederic Jul 10 '15 at 11:28
  • Can you paste the Pig Script where you are calling the UDF, i think its the problem in your Pig script – Abhi Jul 10 '15 at 19:39
  • Hi @Fred, Where can I find more detailed logs ? – Divya Jul 11 '15 at 00:28
  • For people who found this post when looking for [ERROR 1066: Unable to open iterator for alias](http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-in-pig-generic-solution) here is a [generic solution](http://stackoverflow.com/a/34495086/983722). – Dennis Jaheruddin Dec 28 '15 at 15:26

1 Answers1

0

I see 3 issues with your code:

  1. You're missing a semi-colon on the first line. Not sure how it even runs like this, assuming this was a mistake in copying it to StackOverflow
  2. You name a variable "E": which is a reserved variable. Not sure what impact this would have, but I wouldn't do it to be safe. See here for a list of reserved Pig keywords
  3. (This is probably what's causing the error). Your validations make no sense. It looks like you created a split function designed to take 3 or less parameters (the string to split, the delimiter, and the max split size). Yet you're validating that the input has more than 3 parameters. Also you're validating that it has an even number of parameters. That seems like a validation intended for the string after you've split it, not before.

Should be something like:

if (input == null || input.size() == 0 || input.size() > 3) {
  throw new IOException("Incorrect number of values.");
}
//...
if(splits.length % 2 != 0)
  throw new IOException("Invalid key value pairs");

I'd advise not running your programs in the cloud on Hadoop until you've debugged them, get them working locally first. If you use the PigServer class, you can debug UDFs on your development machine through eclipse or a different IDE.

DMulligan
  • 8,993
  • 6
  • 33
  • 34