1

I would like to left pad a string data type field with 0-s. Is there any way to do that? I need to have fixed length (40) values.

thanks in advance, Clairvoyant

clairvoyant
  • 129
  • 1
  • 14

1 Answers1

4

The number of zeros needs to be generate dynamically based on the length of the remaining string, so i don't think its possible in native pig.
This is very much possible in UDF.

input.txt

11111
222222222
33
org.apache.hadoop.util.NativeCodeLoader
apachepig

PigScript:

REGISTER leftformat.jar;

A = LOAD 'input.txt' USING PigStorage() AS(f1:chararray);
B = FOREACH A GENERATE format.LEFTPAD(f1);
DUMP B;

Output:

(0000000000000000000000000000000000011111)
(0000000000000000000000000000000222222222)
(0000000000000000000000000000000000000033)
(0org.apache.hadoop.util.NativeCodeLoader)
(0000000000000000000000000000000apachepig)

UDF code: The below java class file is compiled and generated as leftformat.jar
LEFTPAD.java

package format;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class LEFTPAD extends EvalFunc<String> {
@Override
public String exec(Tuple arg) throws IOException {
       try
        {
            String input = (String)arg.get(0);
            return StringUtils.leftPad(input, 40, "0");
        }
        catch(Exception e)
        {
            throw new IOException("Caught exception while processing the input row ", e);
        }
    }
}

UPDATE:

1.Download 4 jar files from the below link(apache-commons-lang.jar,piggybank.jar, pig-0.11.0.jar and hadoop-common-2.6.0-cdh5.4.5)
http://www.java2s.com/Code/Jar/a/Downloadapachecommonslangjar.htm
http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm
http://www.java2s.com/Code/Jar/p/Downloadpig0110jar.htm

2. Set all the 3 jar files to your class path
  >> export CLASSPATH=/tmp/pig-0.11.1.jar:/tmp/piggybank.jar:/tmp/apache-commons-lang.jar

3. Create directory name format 
    >>mkdir format

4. Compile your LEFTPAD.java and make sure all the three jars are included in the class path otherwise compilation issue will come
    >>javac LEFTPAD.java

5. Move the class file to format folder
    >>mv  LEFTPAD.class format

6. Create jar file name leftformat.jar
    >>jar -cf leftformat.jar format/

7. jar file will be created, include into your pig script

Example from command line:
$ mkdir format
$ javac LEFTPAD.java 
$ mv LEFTPAD.class format/
$ jar -cf leftformat.jar format/
$ ls
LEFTPAD.java    format      input.txt   leftformat.jar  script.pig
Community
  • 1
  • 1
Sivasakthi Jayaraman
  • 4,724
  • 3
  • 17
  • 27
  • Thanks for the answer. I have problems with creating the jar file. I have created the the LEFTPAD.class (above) and created and extension,txt with the followong: Main-Class: LEFTPAD and then run the following: jar cfm leftformat.jar extension.txt LEFTPAD.class. Jar created succesfully, but when calling it from pig script i get an error message – clairvoyant Nov 17 '14 at 10:05
  • error message: Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve format.LEFTPAD using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] – clairvoyant Nov 17 '14 at 10:09
  • Try like this, create a folder name "format" and compile your java file LEFTPAD.java. The LEFTPAD.class should be present inside the "format" folder. Then create jar file like this " jar -cf leftformat.jar format/ ". After that include this jar in your pig script. Please let me know if you face some issues. – Sivasakthi Jayaraman Nov 17 '14 at 10:43
  • javac -g LEFTPAD.java LEFTPAD.java:3: package org.apache.commons.lang does not exist import org.apache.commons.lang.StringUtils; ^ LEFTPAD.java:4: package org.apache.pig does not exist import org.apache.pig.EvalFunc; ^ LEFTPAD.java:5: package org.apache.pig.data does not exist import org.apache.pig.data.Tuple; ^ LEFTPAD.java:7: cannot find symbol symbol: class EvalFunc public class LEFTPAD extends EvalFunc { ^ – clairvoyant Nov 17 '14 at 10:51
  • probably doing something wrong but cannot compile, sorry but i dont know java at all. I used your code as LEFTPAD.java and run javac -g LEFTPAD.java. i have created a format subfolder error message above (only one part, since space limitation) – clairvoyant Nov 17 '14 at 10:53
  • Its working fine for me. Give me 10 mins, let me check locally and come back. – Sivasakthi Jayaraman Nov 17 '14 at 11:00
  • I have updated some steps in the answer section, please follow the same. Please make sure to set all the 3 jars files in the classpath. – Sivasakthi Jayaraman Nov 17 '14 at 11:40
  • thanks, much better now, but still getting one error yet: ~/W/F/g/f/test ❯❯❯ javac LEFTPAD.java LEFTPAD.java:12: cannot access org.apache.hadoop.io.WritableComparable class file for org.apache.hadoop.io.WritableComparable not found String input = ((String) arg0.get(0)); ^ 1 error – clairvoyant Nov 17 '14 at 12:11
  • i guess one more jar is missing for hadoop? – clairvoyant Nov 17 '14 at 12:12
  • yes i have downloded hadoop-common-0.23.9.jar and now it compiled successfully. – clairvoyant Nov 17 '14 at 12:18