If not using regex is an option then you can write your own parser which will iterate one time over all characters in your string, checking if character is inside $...$
, [...]
or <...>
.
- when you find non
.
then you need to just add it to token you are building like any ordinary character,
- same when you find
.
but it is inside previously mentioned "areas".
- But if you find
.
and you are outside of these areas you need to split on it, which means adding currently build token to result and clearing it for next token.
Such parser can look like this
public static List<String> parse(String input){
//list which will hold retuned tokens
List<String> tokens = new ArrayList<>();
// flags representing if currently tested character is inside some of
// special areas
// (at start we are outside of these areas so hey are set to false)
boolean insideDolar = false; // $...$
boolean insideSquareBrackets = false; // [...]
boolean insideAgleBrackets =false; // <...>
// we need some buffer to build tokens, StringBuilder is excellent here
StringBuilder sb = new StringBuilder();
// now lets iterate over all characters and decide if we need to add them
// to token or just add token to result list
for (char ch : input.toCharArray()){
// lets update in which area are we
// finding $ means that we either start or end `$...$` area so
// simple negation of flag is enough to update its status
if (ch == '$') insideDolar = !insideDolar;
//updating rest of flags seems pretty obvious
else if (ch == '[') insideSquareBrackets = true;
else if (ch == ']') insideSquareBrackets = false;
else if (ch == '<') insideAgleBrackets = true;
else if (ch == '>') insideAgleBrackets = false;
// So now we know in which area we are, so lets handle special cases
// if we are handling no dot
// OR we are handling dot but we are inside either of areas we need
// to just add it to token (append it to StringBuilder)
if (ch != '.' || insideAgleBrackets|| insideDolar || insideSquareBrackets ){
sb.append(ch);
}else{// other case means that we are handling dot outside of special
// areas where dots are not separators, so now they represents place
// to split which means that we don't add it to token, but
// add value from buffer (current token) to results and reset buffer
// for next token
tokens.add(sb.toString());
sb.delete(0, sb.length());
}
}
// also since we only add value held in buffer to list of tokens when we
// find dot on which we split, there is high chance that we will not add
// last token to result, because there is no dot after it, so we need to
// do it manually after iterating over all characters
if (sb.length()>0)//non empty token needs to be added to result
tokens.add(sb.toString());
return tokens;
}
and you can use it like
String input = "House.Car2.$abc.def$<ghi.jk>.[0]";
for (String s: parse(input))
System.out.println(s);
output:
House
Car2
$abc.def$<ghi.jk>
[0]