I am getting the error "ERROR 2078: Caught error from UDF: com.Hadoop.pig.SplitRec [Caught exception processing input row [1]]". I am sure that the input string is going out of bound, but I am not sure which record(record number) is causing the problem.
I am trying to create log for displaying the record which is causing the problem, but I am not sure about debugging to print/log the error record.
The input looks like:
**PXW01YIN 12000099PGEN PXW01YINFFFFFFFF PXW01YINIMFGUIPY04301Y301 JFK 00888JFK 008880001 PIMF 0000N/ACTRC5/TXN08/SCR301\/SEQ/TEX021\@
PXW01PIN 12000099PGEN PXW01PINFFFFFFFF PXW01PINIMFGUIAV04301P301 PER 03615PER 036150001 PIMF 0000N/ACTRCK/TXN08/SCR301\/SEQ/TEX021\@**
The above lines are two records and I have tested them(using LIMIT), and they are not causing problem. I have more than 150kb of input data.
The script that I am using:
SPLT_REC1 = load '/user/hduser/output/realdata/pig_out6/part-m-00000' as (tran_array:chararray);
register /home/cloudera/workspace/SplitRec.jar;
define SplitRec com.Hadoop.pig.SplitRec();
SPLT_REC2 = foreach SPLT_REC1 generate SplitRec(tran_array);
store SPLT_REC2 into '/user/hduser/output/realdata/pig_out7';
package com.Hadoop.pig;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
@SuppressWarnings("deprecation")
public class SplitRec extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
String Str1 = (String)input.get(0);
String delim1 = "PIMF+";
String[] tokens1 = Str1.split(delim1);
String part3 = tokens1[0];
String part4 = tokens1[1];
int len1 = part4.length();
String part5 = part4.substring(8,len1);
String conCat1 = part3+":"+part5;
return conCat1;
}
catch(Exception e) {
throw WrappedIOException.wrap("Caught exception processing input row ", e);
}
}