I have these strings;
wordsExpanded="test | is | [(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}] | test | [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] | [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]"
interpretation="{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}"
what I need as output is a string like this;
finalOutput="test | is | thirty four | test | 3 | 1 "
Basically the interpretation string has the informations needed to determine which group has been used. For the first one, we used and therefore the proper string is "(thirty four)" and not "( 3 4 )" The second one would be "( 3 )" and then "( 1 )"
Here is my code so far;
package com.test.prova;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Prova {
public static void main(String[] args) {
String nlInterpretation="{<number_type_2 digits> <number_type_1 digits> <number_type_0 words>}";
String inputText="this is 34 test 3 1";
String grammar="test is [(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}] test [(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}] [(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}]";
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"'\\[]+|\\[([^\\]]*)\\]|'([^']*)'");
Matcher regexMatcher = regex.matcher(grammar);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
matchList.add(regexMatcher.group(2));
} else {
matchList.add(regexMatcher.group());
}
}
String[] xx = matchList.toArray(new String[0]);
String[] yy = inputText.split(" ");
matchList = new ArrayList<String>();
regex = Pattern.compile("[^<]+|<([^>]*)>");
regexMatcher = regex.matcher(nlInterpretation);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
matchList.add(regexMatcher.group(1));
}
}
String[] zz = matchList.toArray(new String[0]);
System.out.println(String.join(" | ",zz));
for (int i=0; i<xx.length; i++) {
if (xx[i].contains("number_type_")) {
matchList = new ArrayList<String>();
regex = Pattern.compile("[^\\(]+|<([^\\)]*)>.*[^<]+|<([^>]*)>");
regexMatcher = regex.matcher(xx[i]);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
matchList.add(regexMatcher.group(2));
} else {
matchList.add(regexMatcher.group());
}
}
System.out.println(String.join(" | ",matchList.toArray(new String[0])));
}
System.out.printf("%02d\t%s\t->%s\n", i, yy[i], xx[i]);
}
}
}
The output generated is as follow;
number_type_2 digits | number_type_1 digits | number_type_0 words
00 this ->test
01 is ->is
thirty four) {<number_type_0 words>} | 3 4 ) {<number_type_0 digits>}
02 34 ->(thirty four) {<number_type_0 words>}( 3 4 ) {<number_type_0 digits>}
03 test ->test
three) {<number_type_1 words>} | 3 ) {<number_type_1 digits>}
04 3 ->(three) {<number_type_1 words>}( 3 ) {<number_type_1 digits>}
one) {<number_type_2 words>} | 1 ) {<number_type_2 digits>}
05 1 ->(one) {<number_type_2 words>}( 1 ) {<number_type_2 digits>}
What I would like is more like this;
number_type_2 digits | number_type_1 digits | number_type_0 words
00 this ->test
01 is ->is
02 34 ->thirty four
03 test ->test
04 3 ->3
05 1 ->1