3

I want to categorize one column with respect to other column using UDF in pig.

Data i have

Id,name,age
1,jhon,31
2,adi,15
3,sam,25
4,lina,28

Expected output

1,jhon,31,30-35
2,adi,15,10-15
3,sam,25,20-25
4,lina,28,25-30

Please suggest

LUZO
  • 1,019
  • 4
  • 19
  • 42

2 Answers2

1

You can do this without a UDF. Assuming you have loaded the data to a relation A.

B = FOREACH A GENERATE A.Id,A.name,A.age,(A.age%5 == 0 ? A.age-5 : (A.age/5)*5) as lower_age,(A.age%5 == 0 ? A.age : ((A.age/5)*5) + 5) as upper_age;
C = FOREACH B GENERATE B.Id,B.name,B.age,CONCAT(CONCAT((chararray)lower_age,'-'),(chararray)upper_age);
DUMP C;
nobody
  • 10,892
  • 8
  • 45
  • 63
  • Thank you for your inputs can you please let me know the UDF process for above requirement. –  Oct 18 '17 at 05:00
0

you can create pig udfs in eclipse

create a project in eclipse with pig jars and try below code

package com;

import java.io.IOException;

import org.apache.pig.EvalFunc;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.Tuple;



public class Age extends EvalFunc<String>{

    @Override
    public String exec(Tuple a) throws IOException {
        // TODO Auto-generated method stub
        if(a == null || a.size() == 0){
            return null;
        }
        try{
            Object object = a.get(0);
            if(object == null){
                return null;
            }
            int i = (Integer) object;
            if(i >= 10 && i <= 20 ){
                return "10-20";
            }
            else if (i >= 21 && i <= 30){
                return "20-30";
            }
            else 
                return ">30";
        } catch (ExecException e){
            throw new IOException(e);
        }
    }

}

Now export the project as jar and register it in pig shell

REGISTER <path of your .jar file>

Define it with package and class.

DEFINE U com.Age();

a = LOAD '<input path>' using PigStorage(',') as (id:int,name:chararray,age:int);

b = FOREACH a GENERATE id,name,age,U(age);
LUZO
  • 1,019
  • 4
  • 19
  • 42