36

I have a string that looks something like the following:

12,44,foo,bar,(23,45,200),6

I'd like to create a regex that matches the commas, but only the commas that are not inside of parentheses (in the example above, all of the commas except for the two after 23 and 45). How would I do this (Java regular expressions, if that makes a difference)?

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
Paul Wicks
  • 62,960
  • 55
  • 119
  • 146

3 Answers3

76

Assuming that there can be no nested parens (otherwise, you can't use a Java Regex for this task because recursive matching is not supported):

Pattern regex = Pattern.compile(
    ",         # Match a comma\n" +
    "(?!       # only if it's not followed by...\n" +
    " [^(]*    #   any number of characters except opening parens\n" +
    " \\)      #   followed by a closing parens\n" +
    ")         # End of lookahead", 
    Pattern.COMMENTS);

This regex uses a negative lookahead assertion to ensure that the next following parenthesis (if any) is not a closing parenthesis. Only then the comma is allowed to match.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 16
    Nice demonstration of the use of `Pattern.COMMENTS`. This is what all Regex answers on stackoverflow should look like. – Wilt May 03 '16 at 09:12
  • @Tim is there any regex which I can use with this string "12,44,foo,bar,(23,45,200(10,11(23))),6". Above logic fails with string I mentioned. – fidato Jan 01 '18 at 15:36
  • @fidato: The Java regex engine doesn't support recursion or balancing which you'd need for this. Are you using a different language? – Tim Pietzcker Jan 01 '18 at 16:00
  • @TimPietzcker I am using ruby. I also posted question regarding the same over here: https://stackoverflow.com/questions/48049938/using-stringsplit-method – fidato Jan 01 '18 at 16:45
  • 1
    This matches A(BC`,`D(F)G. And doesn't represent nested parenthesis. Suggest this approach is a failure. And can never work to match a single parenthesis. –  Oct 19 '19 at 17:36
  • 1
    Precision to @RishiPithadiya comment, the pattern `,(?![^(]*\))` can match in other languages than Java (only one backslash). – Olou May 10 '20 at 11:18
  • @Olou Thanks for putting the pattern on one line like this - saved me time doing it myself! Haha. – Joshua Pinter Jul 19 '23 at 19:31
14

Paul, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

Also the existing solution checks that the comma is not followed by a parenthesis, but that does not guarantee that it is embedded in parentheses.

The regex is very simple:

\(.*?\)|(,)

The left side of the alternation matches complete set of parentheses. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.

In this demo, you can see the Group 1 captures in the lower right pane.

You said you want to match the commas, but you can use the same general idea to split or replace.

To match the commas, you need to inspect Group 1. This full program's only goal in life is to do just that.

import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;

class Program {
public static void main (String[] args) throws java.lang.Exception  {

String subject = "12,44,foo,bar,(23,45,200),6";
Pattern regex = Pattern.compile("\\(.*?\\)|(,)");
Matcher regexMatcher = regex.matcher(subject);
List<String> group1Caps = new ArrayList<String>();

// put Group 1 captures in a list
while (regexMatcher.find()) {
if(regexMatcher.group(1) != null) {
group1Caps.add(regexMatcher.group(1));
}
} // end of building the list

// What are all the matches?
System.out.println("\n" + "*** Matches ***");
if(group1Caps.size()>0) {
for (String match : group1Caps) System.out.println(match);
}
} // end main
} // end Program

Here is a live demo

To use the same technique for splitting or replacing, see the code samples in the article in the reference.

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...
Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
-5

I don’t understand this obsession with regular expressions, given that they are unsuited to most tasks they are used for.

String beforeParen = longString.substring(longString.indexOf('(')) + longString.substring(longString.indexOf(')') + 1);
int firstComma = beforeParen.indexOf(',');
while (firstComma != -1) {
    /* do something. */
    firstComma = beforeParen.indexOf(',', firstComma + 1);
}

(Of course this assumes that there always is exactly one opening parenthesis and one matching closing parenthesis coming somewhen after it.)

Bombe
  • 81,643
  • 20
  • 123
  • 127
  • 3
    And it assumes that there are no commas after the parenthesis. Did you test this? It even fails on the example string Paul supplied. Writing a correct parser that also doesn't choke on malformed input is probably just as hard as writing a correct regex (if not harder). I would *vastly* prefer a regex in this use case, provided the input conforms to defined criteria. – Tim Pietzcker Jan 27 '12 at 12:06
  • You’re right, I ignored the part after the closing paranthesis. Fixed. :) – Bombe Jan 27 '12 at 12:13
  • 2
    What do you do with input like `1,2,(3,4),5,6,(7,8)`? – Tim Pietzcker Jan 27 '12 at 12:32
  • Sorry, but unless the specification of the problem gets a lot more detailed I refuse to play along to your let-me-break-your-parser game. :) – Bombe Jan 27 '12 at 12:33
  • Oh, also, I changed the assumptions in my reply: if you have more than one pair of parantheses it will break. – Bombe Jan 27 '12 at 12:34
  • 1
    If OP's scenario isn't what regex should be used for, then I'm not sure what it *should* be used for. – Qix - MONICA WAS MISTREATED May 24 '13 at 11:26