Questions tagged [pig-udf]

Pig user defined function (UDF) used to specify custom processing. Pig UDFs can currently be implemented in four languages: Java, Python, JavaScript, Ruby and Groovy.

24 questions
3
votes
2 answers

How to create UDF in pig for categorize columns with respect to another filed

I want to categorize one column with respect to other column using UDF in pig. Data i have Id,name,age 1,jhon,31 2,adi,15 3,sam,25 4,lina,28 Expected output 1,jhon,31,30-35 2,adi,15,10-15 3,sam,25,20-25 4,lina,28,25-30 Please suggest
user8510536
1
vote
1 answer

Pig UDF Throwing NullPointerException When Generating New Tuple

I have a Pig UDF which ingests some data and then attempts to transform that data in a minimal manner. my_data = LOAD 'path/to/data' USING SomeCustomLoader(); my_other_data = FOREACH my_data GENERATE MyUDF(COL_1, COL_2, $param1, $param2) as…
clo_jur
  • 1,359
  • 1
  • 11
  • 27
1
vote
1 answer

Pig udf efficiency against cascaded built in fucntions

I am new to PIG scripting, I had a requirement where I needed to perform Ladder If Else for upto 10 conditions, From what knowledge I have we only have ternary operator, so i was thinking to write a UDF, insted of cascading the ternary operator like…
Vik U
  • 25
  • 5
1
vote
0 answers

ERROR 2078: Caught error from UDF

I am getting the error "ERROR 2078: Caught error from UDF: com.Hadoop.pig.SplitRec [Caught exception processing input row [1]]". I am sure that the input string is going out of bound, but I am not sure which record(record number) is causing the…
diggi05
  • 13
  • 2
0
votes
0 answers

PIG - ExecException: ERROR 1070: Could not resolve testPyUDF.testFunc using imports

I am facing a basic issue of importing a python script into PIG. i'm just trying a simple script like this: PIG script REGISTER 'test.py' using jython as testPyUDF; load_data = LOAD '$input_path' USING PigStorage(',') AS (row1); resp = FOREACH…
0
votes
0 answers

Can we do bean injection to Pig UDF

I am using Spring Boot in my project that runs a Pig script. I have defined a Pig UDF. I wonder whether I can inject a bean into this UDF class. The UDF class is something like this: The UDF that works now looks like this: public class MyUDF extends…
larueroad
  • 167
  • 8
0
votes
1 answer

Pig job fails with "org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120"

We are processing 50 million data and after processing in the end we are using rank function in pig script and pig job is getting failed while executing rank function and we are getting below error: …
user9185088
0
votes
0 answers

Failed to parse: could not instantiate '' with arguments 'null' while using UDFContext for json parsing

I am getting below error: Failed to parse: could not instantiate '' with arguments 'null' while using UDFContext for json parsing. . . Caused by: java.io.EOFException: No content to map to Object due to end of input Of course, not getting…
0
votes
1 answer

Getting error 1070 while using UDF in pig

I am getting error 2017-10-29 03:34:22,212 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Syntax error, unexpected symbol at or near ''/home/harsh/Hunny/HadoopPractice/Pig/Upper.jar'' while running pig script. How i…
Harsh
  • 109
  • 1
  • 2
  • 12
0
votes
1 answer

Unable to open iterator for alias

I know this is one of the most repeated question. I have looked almost everywhere and none of the resources could resolve the issue I am facing. Below is the simplified version of my problem statement. But in actual data is little complex so I have…
Rahul
  • 319
  • 2
  • 7
  • 15
0
votes
2 answers

Unable to store alias C, while trying to use Python UDF in pig

My Python UDF code: #commaFormat- format a number with commas, 12345-> 12,345 @outputSchema("numformat:chararray") def commaFormat(num): return '{:,}'.format(num) My Pig script: DEFINE CSVExcelStorage…
0
votes
1 answer

How to combine two lines in pig based on the given format?

am trying to process a file. as of now am getting the output as shown below. input file:- c=1,2,3 a,b,c,d,a d,e,f g,h,i,i c=2,3,4 j,k,l m,n,a,h c=3,2,5 d,g,a s,fs,a expecting an output like:- c=1,2,3,a,b,c,d,a c=1,2,3,d,e,f …
ankush reddy
  • 481
  • 1
  • 5
  • 28
0
votes
1 answer

Pig generate a key change column - comparing previous record with current record but different column

My input data will be in the below format. col1 col2 col3 effective date expiry date 1 Q1 A1 Value1 01/01 01/02 2 Q1 A1 Value1 01/02 01/03 3 Q1 A1 Value1 01/03 01/05 4 Q1 A1 …
Aandal
  • 51
  • 2
  • 11
0
votes
1 answer

How does Pig instantiate UDF objects

Can some one tell me how Pig instantiates UDF objects? I used Pig to construct a pipeline to process some data. I deployed the pipeline in multi-node Hadoop cluster And I want to save all intermediate results that is produced after each step in the…
Trams
  • 239
  • 1
  • 3
  • 11
0
votes
1 answer

Can we able to Split PDF files using Pig Udfs?

I have 100 pdf's but each pdf's have 40 pages, i.e.. it's not processed. Actually, we are trying to use pig Udf?? Can we able to Split PDF files using Pig Udf??
1
2