1

I am new to PIG scripting, I had a requirement where I needed to perform Ladder If Else for upto 10 conditions, From what knowledge I have we only have ternary operator, so i was thinking to write a UDF, insted of cascading the ternary operator like below :- ( condition : statement1 ? ( condition : statement 2 ? statement 3 ))

The data size is in tens of million rows, Should i even proceed with putting an effort in creating a UDF for my requirement.?

As in the end if it causes performance problems there will be no point in putting an effort.

From what i know, a call to the UDF will be made for each row in consideration, and a recursive call on a Million records is a serious overhead.

Community
  • 1
  • 1
Vik U
  • 25
  • 5

1 Answers1

1

I think if you have access for a big cluster the UDF should't be a problem and it's improve the readability of your script. At the end your script also compiled to a java executable. The biggest win on the performance if you can filter your data before the expensive operations.

kecso
  • 2,387
  • 2
  • 18
  • 29
  • Thanks Kecso, I had done the same thing, removed unnecessary attributes of the data, and then did the complex logic, the code is performing a little better now. – Vik U Jun 12 '16 at 07:23