1

How can I left pad numbers with zeros in Pig Lagin

I have the year, month and date in 3 fields and I want to create a YYYY-MM-DD format from it. I see in Formatting Date in Generate Statement that I can use CONCAT to get then in YYYY-MM-DD format, but months and dates <10 are not left padded with zeros

So instead of 2014-01-01 , I get 2014-1-1.

Community
  • 1
  • 1

1 Answers1

0

You can solve this problem in 3 ways.
Option1: If you installed pig 0.14 version then try this approach

input

2014    11      12
2013    01      02
2012    12      3
2011    5       24
2010    1       1

PigScript:

A = LOAD 'input' USING PigStorage() AS(year:int,month:int,date:int);
B = FOREACH A GENERATE SPRINTF('%04d-%02d-%02d',year,month,date) AS (finaldate:chararray);
DUMP B;

OutPut:

(2014-11-12)
(2013-01-02)
(2012-12-03)
(2011-05-24)
(2010-01-01)

Reference: http://pig.apache.org/docs/r0.14.0/func.html#sprintf

Option2: pig version 0.13 or less (using Custom UDF)

PigScript:

REGISTER leftformat.jar;
A = LOAD 'input' USING PigStorage() AS(year:chararray,month:chararray,date:chararray);
B = FOREACH A GENERATE CONCAT(year,'-',CONCAT(month,'-',date)) AS finalDate;
C = FOREACH B GENERATE format.LEFTFORMAT(finalDate);
DUMP C;

Output:

(2014-11-12)
(2013-01-02)
(2012-12-03)
(2011-05-24)
(2010-01-01)

LEFTFORMAT.java

package format;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class LEFTFORMAT extends EvalFunc<String> {
@Override
public String exec(Tuple arg0) throws IOException {
       try
        {
            String input = ((String) arg0.get(0));
            String year = input.split("-")[0];
            String month = input.split("-")[1];
            String date = input.split("-")[2];
            return (StringUtils.leftPad(year, 4, "0")+"-"+StringUtils.leftPad(month, 2, "0")+"-"+StringUtils.leftPad(date, 2, "0"));
        }
        catch(Exception e)
        {
            throw new IOException("Caught exception while processing the input row ", e);
        }
    }
}

Reference:
Left padding a string in pig
This will help you how to compile,build jar and link to pig script.

Option3:
You can use any of the below supported format

ToString(Todate(<CONCAT your inputs>,<supportedFormat>))

Check the supported format in the below link.
Human readable String date converted to date using Pig?

Community
  • 1
  • 1
Sivasakthi Jayaraman
  • 4,724
  • 3
  • 17
  • 27