1

I am getting the error "ERROR 2078: Caught error from UDF: com.Hadoop.pig.SplitRec [Caught exception processing input row [1]]". I am sure that the input string is going out of bound, but I am not sure which record(record number) is causing the problem.

I am trying to create log for displaying the record which is causing the problem, but I am not sure about debugging to print/log the error record.

The input looks like:

**PXW01YIN 12000099PGEN PXW01YINFFFFFFFF PXW01YINIMFGUIPY04301Y301 JFK 00888JFK 008880001 PIMF 0000N/ACTRC5/TXN08/SCR301\/SEQ/TEX021\@

PXW01PIN 12000099PGEN PXW01PINFFFFFFFF PXW01PINIMFGUIAV04301P301 PER 03615PER 036150001 PIMF 0000N/ACTRCK/TXN08/SCR301\/SEQ/TEX021\@**

The above lines are two records and I have tested them(using LIMIT), and they are not causing problem. I have more than 150kb of input data.

The script that I am using:

   SPLT_REC1 = load  '/user/hduser/output/realdata/pig_out6/part-m-00000' as (tran_array:chararray);
   register /home/cloudera/workspace/SplitRec.jar;
   define SplitRec com.Hadoop.pig.SplitRec();
   SPLT_REC2 = foreach SPLT_REC1 generate SplitRec(tran_array);
   store SPLT_REC2 into '/user/hduser/output/realdata/pig_out7';





package com.Hadoop.pig;
import  java.io.IOException;

import  org.apache.pig.EvalFunc;
import  org.apache.pig.data.Tuple;
import  org.apache.pig.impl.util.WrappedIOException;

@SuppressWarnings("deprecation")
public class SplitRec extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
    if (input == null || input.size() == 0)
        return null;

    try {
        String Str1 = (String)input.get(0);
        String delim1 = "PIMF+";
        String[] tokens1 = Str1.split(delim1);

        String part3 = tokens1[0];
        String part4 = tokens1[1];
        int len1 = part4.length();
        String part5 = part4.substring(8,len1);

        String conCat1 = part3+":"+part5;
        return conCat1;
    }
    catch(Exception e) {
        throw WrappedIOException.wrap("Caught exception processing input row ", e);
    }

}
diggi05
  • 13
  • 2
  • Have the exception print out each part, like `throw WrappedIOException.wrap("Caught exception processing input row |"+ tokens1 +"|"+ part3 +"|"+ part4 ", e);` – Petro Mar 02 '16 at 02:21
  • Thanks for your reply, but I am not sure if we can use the variables with the throw, as the variables are not in scope of the catch block. With the code that you provided, I am getting an error message: "part4 cannot be resolved to a variable - part3 cannot be resolved to a variable - tokens1 cannot be resolved to a variable" – diggi05 Mar 02 '16 at 17:17
  • just make globals for testing and see what happens – Petro Mar 02 '16 at 18:51
  • make sure that the data loaded from `SPLT_REC1` is in proper and expected format. You can try `ILLUSTRATE SPLT_REC1;` – rahulbmv Mar 04 '16 at 01:15
  • Thanks for helping me out. It seemed to be problem with the input record only. I have changed the validation in the UDF as: if(!Str1.substring(114,118).equalsIgnoreCase(pImf)) {return null;} It works for me in accordance with the requirement. – diggi05 Mar 04 '16 at 03:53

0 Answers0