0

I am trying to use string template to generate Pig/Hadoop code. Since I am a novice I couldn't figure it out myself. Any help will be appreciated.

I have a List of LocalDate like one show below

List<LocalDate> dates = Arrays.asList("20100101", "20100102").stream().map(d -> LocalDate.parse(d,formatter)).collect(Collectors.toList());

The list can have 1 dates or many dates.

If the list "dates" contains more than one element then I would like to generate:

SPLIT finalizedEvents INTO splitByDay_20100101 IF dataDate == 20100101,
                  INTO splitByDay_20100102 IF dataDate == 20100102, ....; // for all date in "dates" list
// similarly for all dates
// formatting substitution variable e.g. 2010/01/01 instead of 20100101 is needed
STORE splitByDay_20100101 INTO '/a/b/2010/01/01' USING AvroStorage();
STORE splitByDay_20100102 INTO '/a/b/2010/01/02' USING AvroStorage();

If the list "dates" contain one element only then I would like to generate (assume dates = [ 20100101] )

splitByDay_20100101 = FOREACH finalizedEvents GENERATE $0..;
STORE splitByDay_20100101 INTO '/a/b/2010/01/01' USING AvroStorage();

So far I have done something like the following but not sure how to do the conditionals

ST e = new ST("SPLIT finalizedEvents INTO <[dates]:{ d | IF split_<d> BY daysSinceEpoch == <d>}; separator=\", \">;");
e.add("dates", dates);
System.out.println(e.render());
user3138594
  • 209
  • 3
  • 9

1 Answers1

0

Here is what I came up with (detailed explanation below):

Java Code:

List<LocalDate> dates = new ArrayList<>();
dates.add(LocalDate.parse("20100101", DateTimeFormatter.BASIC_ISO_DATE));
dates.add(LocalDate.parse("20100201", DateTimeFormatter.BASIC_ISO_DATE));

List<List<Character>> charListList = new ArrayList<>();
for (LocalDate date : dates) {
    List<Character> charList = new ArrayList<>();
    char[] dateCharArray = date.toString().toCharArray();
    for (char c : dateCharArray) {
        charList.add(c);
    }
    charListList.add(charList);
}

STGroup dateGroup = new STGroupFile("./src/com/stackoverflow/DateList/dates.stg");
ST dateTemp = dateGroup.getInstanceOf("writeCode");
dateTemp.add("formattedDates", charListList);
dateTemp.add("isSingle", charListList.size() == 1);

System.out.println(dateTemp.render());


StringTemplate Code (dates.stg):

writeCode(formattedDates, isSingle) ::= <<
<if(isSingle)><writeSingleStuff(formattedDates)>
<else><writeMultipleStuff(formattedDates)>
<endif>
<writeStoreList(formattedDates)>
>>


writeSingleStuff(date)::= "<date:{d|splitByDay_<wordReplaceWSlash(d)> = FOREACH finalizedEvents GENERATE $0..;}>"

writeMultipleStuff(rawDates)::= "SPLIT finalizedEvents <rawDates:{d|INTO splitByDay_<wordReplaceWEmpty(d)> IF dataDate == <wordReplaceWEmpty(d)>}; separator=\", \">;"

writeStoreList(formattedDates)::= "<formattedDates:{d|STORE splitByDay_<wordReplaceWEmpty(d)> INTO '/a/b/<wordReplaceWSlash(d)>' USING AvroStorage();<\n>}>"


wordReplaceWSlash(word) ::= "<word:{char|<charReplaceWSlash(char)>}>"

charReplaceWSlash(theChar) ::= <%<charReplaceWSlashMap.(theChar)>%>

charReplaceWSlashMap ::= [
    "-":"/",
    default:{<theChar>}
]


wordReplaceWEmpty(word) ::= "<word:{char|<charReplaceWEmpty(char)>}>"

charReplaceWEmpty(theChar) ::= <%<charReplaceWEmptyMap.(theChar)>%>

charReplaceWEmptyMap ::= [
    "-":"",
    default:{<theChar>}
]


What the code does:

Java Code:

List<LocalDate> dates = new ArrayList<>();
dates.add(LocalDate.parse("20100101", DateTimeFormatter.BASIC_ISO_DATE));
dates.add(LocalDate.parse("20100201", DateTimeFormatter.BASIC_ISO_DATE));

This is a list with LocalDates that we wanna use as input for the templates. I formatted them with DateTimeFormatter.BASIC_ISO_DATE so that e.g. 20100101 becomes 2010-01-01. We need this later because we will tell StringTemplate to replace - with either / or an empty string to get the two types of date formats that we want (and I didn't find a formatter that gets 2010/01/01 in the first place).
Two different approaches would be:

  • Replace - with / in Java code, not in StringTemplate. Then, we only had to replace / with an empty string, if we need this date format.
  • add year, month and day as three different variables into the template. Then we could concatenate the strings and add / when we need to.


List<List<Character>> charListList = new ArrayList<>();
for (LocalDate date : dates) {
    List<Character> charList = new ArrayList<>();
    char[] dateCharArray = date.toString().toCharArray();
    for (char c : dateCharArray) {
        charList.add(c);
    }
    charListList.add(charList);
}

It's not very pretty that we have to do this, but:
We have to have a list (and not an array) of the chars of a date so that we can "iterate" over it in StringTemplate. And there is no direct way to convert char[] to List. In the end, we have to put all these lists together in one list so that we can generate code for every date we have (charListList).


STGroup dateGroup = new STGroupFile("./src/com/stackoverflow/DateList/dates.stg");
ST dateTemp = dateGroup.getInstanceOf("writeCode");
dateTemp.add("formattedDates", charListList);
dateTemp.add("isSingle", charListList.size() == 1);

System.out.println(dateTemp.render());

Here we fill the template with values. We have to tell StringTemplate here whether or not we have exactly one date in our charListList because StringTemplate is (by design) not capable of doing so.


StringTemplateCode:

writeCode(formattedDates, isSingle) ::= <<
<if(isSingle)><writeSingleStuff(formattedDates)>
<else><writeMultipleStuff(formattedDates)>
<endif>
<writeStoreList(formattedDates)>
>>

This is the "root" template that basically just delegates the work to other templates. It handles the case distiction between one or many dates.


writeSingleStuff(date)::= "<date:{d|splitByDay_<wordReplaceWSlash(d)> = FOREACH finalizedEvents GENERATE $0..;}>"

writeMultipleStuff(rawDates)::= "SPLIT finalizedEvents <rawDates:{d|INTO splitByDay_<wordReplaceWEmpty(d)> IF dataDate == <wordReplaceWEmpty(d)>}; separator=\", \">;"

writeStoreList(dates)::= "<dates:{d|STORE splitByDay_<wordReplaceWEmpty(d)> INTO '/a/b/<wordReplaceWSlash(d)>' USING AvroStorage();<\n>}>"

With these three lines, we write the code that is specific for a single item, for mutltiple items, and the code that both have in common.
Although we know that date has only one item when we enter writeSingleStuff, we have to iterate through the list.


wordReplaceWSlash(word) ::= "<word:{char|<charReplaceWSlash(char)>}>"

charReplaceWSlash(theChar) ::= <%<charReplaceWSlashMap.(theChar)>%>

charReplaceWSlashMap ::= [
    "-":"/",
    default:{<theChar>}
]


wordReplaceWEmpty(word) ::= "<word:{char|<charReplaceWEmpty(char)>}>"

charReplaceWEmpty(theChar) ::= <%<charReplaceWEmptyMap.(theChar)>%>

charReplaceWEmptyMap ::= [
    "-":"",
    default:{<theChar>}
]

These are two groups of templates that almost do the same thing: Replace every - char in every "word" with either / or an empty string. We use a little dictionary for that that replaces every char but - with itself.

Community
  • 1
  • 1
Colophonius
  • 43
  • 1
  • 1
  • 6