3

I'm trying to write a Regex expression that can determine if a string contains an odd number of " - quotation marks.

An answerer on this question has accomplished something very similar for determining if a string of letters contains an odd number of a certain letter. However I am having trouble adapting it to my problem.

What I have so far, but is not exactly working:

String regexp = "(\\b[^\"]*\"(([^\"]*\"){2})*[^\"]*\\b)";
        Pattern pattern = Pattern.compile(regexp);
        Matcher matcher = pattern.matcher("bbacac");
        if(matcher.find()){
            System.out.println("Found");
        }
        else
            System.out.println("Not Found");
Community
  • 1
  • 1
CodyBugstein
  • 21,984
  • 61
  • 207
  • 363

6 Answers6

10

Regex is a fairly poor solution for this. <-- I though you were talking about nesting, not pair matching.

Iterating over all characters in the string, counting instances of " would be a faster and more efficient way to achieve this.

int quoteCount = 0;
for(char ch : inputString.toCharArray())
{
  if(ch == '"') quoteCount++;
}

boolean even = quoteCount % 2 == 0;
Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • Todah! Can you explain why regex is a poor solution? I aggree it's a little to understand, but are there other reasons? – CodyBugstein May 31 '13 at 19:02
  • @Eng.Fouad - true. Missed the Java tag – Oded May 31 '13 at 19:03
  • @Oded That would only apply if there is a possibility of massive strings, and very large number of matches, correct? – CodyBugstein May 31 '13 at 19:10
  • 1
    @Oded: That question was about nesting, not pairing. And (if you read my answer) even the oldest regex engines can handle this perfectly well, even with escaped quotes (which a non-regex solution will have a harder time with). – Tim Pietzcker May 31 '13 at 19:10
  • @TimPietzcker - fair point. I mistook the problem to be a nesting issue. – Oded May 31 '13 at 19:15
7

If you want a regex, this is simple to accomplish:

boolean oddQuotes = subjectString.matches("[^\"]*\"(?:[^\"]*\"[^\"]*\")*[^\"]*");

Explanation: (without all the Java quote escapes):

[^"]*"   # Match any number of non-quote characters, then a quote
(?:      # Now match an even number of quotes by matching:
 [^"]*"  #  any number of non-quote characters, then a quote
 [^"]*"  #  twice
)*       # and repeat any number of times.
[^"]*    # Finally, match any remaining non-quote characters

So far, this is probably slower than a simple "count the quotes" solution. But we can do one better: We can design the regex to also handle escaped quotes, i. e. not to count a quote if it's preceded by an odd number of backslashes:

boolean oddQuotes = subjectString.matches("(?:\\\\.|[^\\\\\"])*\"(?:(?:\\\\.|[^\\\\\"])*\"(?:\\\\.|[^\\\\\"])*\")*(?:\\\\.|[^\\\\\"])*");

Now admittedly, this looks horrible, but mainly because of Java's string escaping rules. The actual regex is straightforward:

(?:       # Match either
 \\.      # an escaped character
|         # or
 [^\\"]   # a character except backslash or quote
)*        # any number of times.
"         # Then match a quote.
(?:       # The rest of the regex works just the same way (as above)
 (?:\\.|[^\\"])*"
 (?:\\.|[^\\"])*"
)*
(?:\\.|[^\\"])*
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Fantastic! Can you explain it briefly? – CodyBugstein May 31 '13 at 19:00
  • 1
    You would do well to mention that `matches()` automatically anchors the matches--or even better, add explicit anchors to the regex. That [other question](http://stackoverflow.com/questions/10297907/regex-odd-number-of-occurences-of-a-char) only needed word boundaries, but this regex MUST be anchored at both ends if it's to have any meaning. – Alan Moore Jun 06 '13 at 06:42
1

Or, use a regex, replace everything except for quotation marks with empty strings, and check the length of the result.

paulmelnikow
  • 16,895
  • 8
  • 63
  • 114
1

Don't use regex for this. Just iterate through the characters in the string and count the "". It's going to be a lot more efficient. It's an O(n) algorithm.

Especially if it's simple and make the solution a lot easier to read than some obscure regex pattern.

boolean odd = false;
for(int i=0; i<s.length(); i++) {
  if(s.chartAt(i) == '\"') odd != odd;
}
mprivat
  • 21,582
  • 4
  • 54
  • 64
  • What's the complexity if I use regex? – CodyBugstein May 31 '13 at 18:58
  • 1
    At best O(n) depending on your regex, except with a complicated Regex that hurts maintainability of your code. It's not all about the number of lines needed, sometimes it pays to be explicit and verbose. – mprivat May 31 '13 at 19:11
0

You can use split and check if the nubmer of elements in the returned array is even or odd to gauge the odd or even-ness of that character's frequency

String s = ".. what ever is in your string";
String[] parts = s.split("\"");
if(parts.size()%2){
   //String has odd number of quotes
}else{
   //String has even number of quotes
}
Tucker
  • 7,017
  • 9
  • 37
  • 55
0

I would have to say it probably better to just count the number of "s manually, but if you really want a regular expression, here is one that should work:

"(^(([^\"]*\"){2})*[^\"]*$)"

I just bound the expression to the front and back of the string and make sure there are only pairs of "s, blindly absorbing anything not a " between them.

Cemafor
  • 1,633
  • 12
  • 27