It seems that you just used regex from this answer, but as you could see it doesn't use split
but find
method from Matcher
class. Also this answer takes care of '
where your input shows no signs of it.
So you can improve this regex by removing parts handling '
which will make it look like
[^\\s\"]+|\"([^\"]*)\"
Also since you want to include "
as part of token then you don't need to place match from between "
in separate group, so get rid of parenthesis in \"([^\"]*)\"
part
[^\\s\"]+|\"[^\"]*\"
Now all you need to do is add case where there will be no closing "
, but instead you will get end of string. So change this regex to
[^\\s\"]+|\"[^\"]*(\"|$)
After this you can just use Matcher, find
all store tokens somewhere, lets say in List
.
Example:
String data = "It is fun \"to write\" regular\"expression";
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"]+|\"[^\"]*(\"|$)");
Matcher regexMatcher = regex.matcher(data);
while (regexMatcher.find()) {
System.out.println(regexMatcher.group());
matchList.add(regexMatcher.group());
}
Output:
It
is
fun
"to write"
regular
"expression
More complex expression to handle handle this data can look like
String data = "It is fun \"to write\" regular \"expression";
for(String s : data.split("(?<!\\G)(?<=\\G[^\"]*(\"[^\"]{0,100000}\")?[^\"]*)((?<=\"(?!\\s))|\\s+|(?=\"))"))
System.out.println(s);
but this approach is way overcomplicated then writing your own parser.
Such parser could look like
public static List<String> parse(String data) {
List<String> tokens = new ArrayList<String>();
StringBuilder sb = new StringBuilder();
boolean insideQuote = false;
char previous = '\0';
for (char ch : data.toCharArray()) {
if (ch == ' ' && !insideQuote) {
if (sb.length() > 0 && previous != '"')
addTokenAndResetBuilder(sb, tokens);
} else if (ch == '"') {
if (insideQuote) {
sb.append(ch);
addTokenAndResetBuilder(sb, tokens);
} else {
addTokenAndResetBuilder(sb, tokens);
sb.append(ch);
}
insideQuote = !insideQuote;
} else {
sb.append(ch);
}
previous = ch;
}
addTokenAndResetBuilder(sb, tokens);
return tokens;
}
private static void addTokenAndResetBuilder(StringBuilder sb, List<String> list) {
if (sb.length() > 0) {
list.add(sb.toString());
sb.delete(0, sb.length());
}
}
Usage
String data = "It is fun \"to write\" regular\"expression\"xxx\"yyy";
for (String s : parse(data))
System.out.println(s);