5

Basically, my goal is to remove everything inside ()'s except for strings that are inside "".

I was following the code here: Remove text in-between delimiters in a string (using a regex?)

And that works great; but I have the additional requirement of not removing ()s if they are in "". Is that something that can be done with a regular expression. I feel like I'm dangerously close to needing another approach like a true parser.

This is the what I've been using....

string RemoveBetween(string s, char begin, char end)
{
    Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
    return regex.Replace(s, string.Empty);
}
Community
  • 1
  • 1
Rob P.
  • 14,921
  • 14
  • 73
  • 109
  • 1
    Is there also a requirement that the user can insert a double-quote within the double-quotes using an escape character? ("The dog said \"Woof\"") – Andrew Shepherd Jun 05 '11 at 23:09

3 Answers3

3

I don't speak C, but here's the java implementation:

input.replaceAll("(?<=\\().*?(?=[\"()])(\"([^\"]*)\")?.*(?=\\))", "$2");

This produces the following results:

"foo (bar \"hello world\" foo) bar" --> "foo (hello world) bar"
"foo (bar foo) bar" --> "foo () bar"

It wasn't clear whether you wanted to preserve the quotes - if you did, use $1 instead of $2

Now that you've got the working regex, you should be able to make it work for you in C.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
3

.NET regexes are even more powerful than the usual and you can surely do what you want. Take a look at this, which looks for balanced parentheses, which is essentially the same problem as yours but with parentheses and not quotes.

http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

Mark Sowul
  • 10,244
  • 1
  • 45
  • 51
2

It's risky to say "No you can't" on this forum, because somebody will go and ruin it by providing a working answer. :-)

But I will say that this would be really stretching regular expressions, and your problem elegantly lends itself to Automata-based programming.

Personally, I'm happier maintaining a 20-line finite state machine then a 10 character regular expression.

Andrew Shepherd
  • 44,254
  • 30
  • 139
  • 205