I've tested your pattern (unescaped, it's \son\s|\sinner\s|\souter\s|\sjoin\s
) against this test string: table1 t1 inner join table2 t2 outer join table3
on regex101.com and the only match I got is for inner and outer. So since you're splitting the string by these tokens, you get your result.
Perhaps this can help you for your specific case. I have went for a regex approach, instead of splitting the data.
public class PatternChecker {
public static void main(String[] args) {
String str = "table1 t1 inner join table2 t2 outer join table3";
Pattern p = Pattern.compile("(table[0-9]+( [a-zA-Z0-9]+ )?)");
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group(0));
}
}
}
Later edit
The split pattern \\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s
did not work because of the mandatory whitespaces used.
For instance, you are searching for *on*
or *inner*
or *outer*
or *join*
(whitespaces are marked with an asterisk). The whitespaces are part of the keywords you're splitting with. *join*
could not be matched since its left-side whitespace was already picked up by the *outer*
and *inner*
right-side whitespace matches.
Going back to the split solution, one fix would be to mark the left-side whitespace of join as optional via the ?
quantifier; this would be the new pattern: \\son\\s|\\sinner\\s|\\souter\\s|\\s?join\\s
. This yields some empty tokens that can be filtered out
Another idea would be to consider aggregations using join (i.e. inner join, outer join) as full search criteria, which would lead to \\son\\s|\\sinner join\\s|\\souter join\\s
. No empty tokens are generated.
public class PatternChecker {
public static void main(String[] args) {
String str = "employee t1 inner join department t2 outer join job join table4 history on a=b";
String[] tokens = str.split("\\son\\s|\\sinner join\\s|\\souter join\\s|\\sjoin\\s");
for(String token : tokens) {
System.out.println(token);
}
// Output
// employee t1
// department t2
// job
// table4 history
// a=b
}
}
Note that, since you're also including on
, you can filter out all the matched tokens containing the equals symbol.
For a generic fix, you would need to isolate the string contained between from
and where
and apply the idea above.