4

I need a Java RegEx to split, or find something in a string, but exclude stuff that's between double quotes. What I do now is this:

String withoutQuotes = str.replaceAll("\\\".*?\\\"", "placeholder");
withoutQuotes = withoutQuotes.replaceAll(" ","");

but this doesn't work nice with indexOf, and I also need to be able to split, for example:

String str = "hello;world;how;\"are;you?\""
String[] strArray = str.split(/*some regex*/);
// strArray now contains: ["hello", "world", "how", "\"are you?\"]
  • quotes are always balanced
  • quotes can be escaped with \"

Any help is appreciated

Dirk
  • 2,094
  • 3
  • 25
  • 28
  • Are quotes always balanced? And can these quotes be escaped using `\"` – anubhava Nov 04 '13 at 15:46
  • oh sorry, forgot to mention. Yes, quotes are balanced, and yes, they can be escaped with \" – Dirk Nov 04 '13 at 15:50
  • What do you mean the first thing doesn't work nice with indexOf? – cangrejo Nov 04 '13 at 16:10
  • if you save the index, and then use it on the original string, it won't be the same char, because the placeholder will probably have a different length from the original quoted text – Dirk Nov 04 '13 at 16:31

2 Answers2

5

Ok here is a code that will work for you:

String str = "a \"hello world;\";b \"hi there!\"";
String[] arr = str.split(";(?=(([^\"]*\"){2})*[^\"]*$)");
System.out.println(Arrays.toString(arr));

This regex find a semi-colon if it is followed by even number of double quotes (which means ; is outside quotes).

OUTPUT:

[a "hello world;", b "hi there!"]

PS: It doesn't take care of escaped quotes like \"

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

Resurrecting this question because it had a simple regex solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

\"[^\"]*\"|(;)

The left side of the alternation matches complete quoted strings. We will ignore these matches. The right side matches and captures semi-colons to Group 1, and we know they are the right semi-colons because they were not matched by the expression on the left.

Here is working code (see online demo):

import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;

class Program {
public static void main (String[] args) throws java.lang.Exception  {

String subject = "hello;world;how;\"are;you?\"";
Pattern regex = Pattern.compile("\"[^\"]*\"|(;)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
    if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
    else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits) System.out.println(split);
} // end main
} // end Program

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...
Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105