11

I have the following Java SE code, which runs on PC

public static void main(String[] args) {
    // stringCommaPattern will change
    // ","abc,def","
    // to
    // ","abcdef","        
    Pattern stringCommaPattern = Pattern.compile("(\",\")|,(?=[^\"[,]]*\",\")");
    String data = "\"SAN\",\"Banco Santander, \",\"NYSE\"";
    System.out.println(data);
    final String result = stringCommaPattern.matcher(data).replaceAll("$1");
    System.out.println(result);
}

I'm getting expected result

"SAN","Banco Santander, ","NYSE"
"SAN","Banco Santander ","NYSE"

However, when comes to Android.

Pattern stringCommaPattern = Pattern.compile("(\",\")|,(?=[^\"[,]]*\",\")");
String data = "\"SAN\",\"Banco Santander, \",\"NYSE\"";
Log.i("CHEOK", data);
final String result = stringCommaPattern.matcher(data).replaceAll("$1");
Log.i("CHEOK", result);

I'm getting

"SAN","Banco Santander, ","NYSE"
"SAN","Banco Santandernull ","NYSE"

Any suggestion and workaround, how I can make this code behaves same as it is at Java SE?


Additional Note :

Other patterns yield the same result as well. It seems that, Android is using null string for unmatched group, and Java SE is using empty string for unmatched group.

Take the following code.

public static void main(String[] args) {
    // Used to remove the comma within an integer digit. The digit must be located
    // in between two string. Replaced with $1.
    //
    // digitPattern will change
    // ",100,000,"
    // to
    // ",100000,"        
    final Pattern digitPattern = Pattern.compile("(\",)|,(?=[\\d,]+,\")");
    String data = "\",100,000,000,\"";
    System.out.println(data);
    final String result = digitPattern.matcher(data).replaceAll("$1");
    System.out.println(result);
}

Java SE

",100,000,000,"
",100000000,"

Android

",100,000,000,"
",100null000null000,"
Cheok Yan Cheng
  • 47,586
  • 132
  • 466
  • 875

1 Answers1

3

Not a reason why, but as a workaround you could do the appendReplacement loop yourself rather than using replaceAll

StringBuffer result = new StringBuffer();
Matcher m = digitPattern.matcher(data);
while(m.find()) {
  m.appendReplacement(result, (m.group(1) == null ? "" : "$1"));
}
m.appendTail(result);

This should work on both JavaSE and Android.

Or sidestep the problem entirely by changing the regex

Pattern commaNotBetweenQuotes = Pattern.compile("(?<!\"),(?!\")");
String result = commaNotBetweenQuotes.matcher(data).replaceAll("");

Here the regex matches just the commas you want to change, and not the ones you want to leave intact, so you can just replace them all with "" with no need for capturing groups.

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • Thanks! I really love your solution for commaNotBetweenQuotes. I try to modify it to turn `" ,100,000,000, "` (space added) to `" ,100000000, "`. I use `Pattern.compile("(?<!\"\\s{0,0x7FFFFFFE}),(?!\\s{0,0x7FFFFFFE}\")")` according to http://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java, but rutime yields `java.util.regex.PatternSyntaxException: Error in {min,max} interval near index 12`. Do you have any suggestion? – Cheok Yan Cheng Mar 29 '13 at 12:16
  • @CheokYanCheng I wasn't aware you could use hex literals in a bounded range operator. Maybe try something like `\\s{0,10}` with a decimal number instead? – Ian Roberts Mar 29 '13 at 13:30
  • Thanks. I was able to use Pattern.compile("(?<!\"\\s{0," + (Integer.MAX_VALUE-1) + "}),(?!\\s{0," + (Integer.MAX_VALUE-1) + "}\")"); to resolve the problem. You are right. It is matter of hex literals. Note, I need to use -1, as 1 double quote already occupy the allowable space. – Cheok Yan Cheng Mar 29 '13 at 16:08