As matt b said, [^\\],
will interpret the character preceding the comma as a part of the delimiter.
"test\\\\\\,test\\\\,test\\,test,test"
-(split)->
["test\\\\\\,test\\\\,test\\,tes" , "test"]
As drvdijk said, (?<!\\),
will misinterpret escaped backslashes.
"test\\\\\\,test\\\\,test\\,test,test"
-(split)->
["test\\\\\\,test\\\\,test\\,test" , "test"]
-(unescape commas)->
["test\\\\,test\\,test,test" , "test"]
I would expect being able to escape backslashes as well...
"test\\\\\\,test\\\\,test\\,test,test"
-(split)->
["test\\\\\\,test\\\\" , "test\\,test" , "test"]
-(unescape commas and backslashes)->
["test\\,test\\" , "test,test" , "test"]
drvdijk suggested (?<=(?<!\\\\)(\\\\\\\\){0,100}),
which works well for lists with elements ending with up to 100 backslashes. This is far enough... but why a limit? Is there a more efficient way (isn't lookbehind greedy)? What about invalid strings?
I searched for a while for a generic solution, then I wrote the thing myself... The idea is to split following a pattern that matches the list elements (instead of matching the delimiter).
My answer does not take the escape character as a parameter.
public static List<String> commaDelimitedListStringToStringList(String list) {
// Check the validity of the list
// ex: "te\\st" is not valid, backslash should be escaped
if (!list.matches("^(([^\\\\,]|\\\\,|\\\\\\\\)*(,|$))+")) {
// Could also raise an exception
return null;
}
// Matcher for the list elements
Matcher matcher = Pattern
.compile("(?<=(^|,))([^\\\\,]|\\\\,|\\\\\\\\)*(?=(,|$))")
.matcher(list);
ArrayList<String> result = new ArrayList<String>();
while (matcher.find()) {
// Unescape the list element
result.add(matcher.group().replaceAll("\\\\([\\\\,])", "$1"));
}
return result;
}
Description for the pattern (unescaped):
(?<=(^|,))
forward is start of string or a ,
([^\\,]|\\,|\\\\)*
the element composed of \,
, \\
or characters wich are neither \
nor ,
(?=(,|$))
behind is end of string or a ,
The pattern may be simplified.
Even with the 3 parsings (matches
+ find
+ replaceAll
), this method seems faster than the one suggested by drvdijk. It can still be optimized by writing a specific parser.
Also, what is the need of having an escape character if only one character is special, it could simply be doubled...
public static List<String> commaDelimitedListStringToStringList2(String list) {
if (!list.matches("^(([^,]|,,)*(,|$))+")) {
return null;
}
Matcher matcher = Pattern.compile("(?<=(^|,))([^,]|,,)*(?=(,|$))")
.matcher(list);
ArrayList<String> result = new ArrayList<String>();
while (matcher.find()) {
result.add(matcher.group().replaceAll(",,", ","));
}
return result;
}