15

I have a string like this:

a b c a b " a b " b a " a "

How do I match every a that is not part of a string delimited by "? I want to match everything that is bold here:

a bc a b " ab " b a " a "

I want to replace those matches (or rather remove them by replacing them with an empty string), so removing the quoted parts for matching won't work, because I want those to remain in the string. I'm using Ruby.

js-coder
  • 8,134
  • 9
  • 42
  • 59
  • A regex matches a single substring at a time. How to loop a regex is a feature of the hosting language. Which language are you using? – tripleee Jul 16 '12 at 11:15

3 Answers3

28

Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:

result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')

This replaces all the as with the empty string if and only if there is an even number of quotes ahead of the matched a.

Explanation:

a        # Match a
(?=      # only if it's followed by...
 (?:     # ...the following:
  [^"]*" #  any number of non-quotes, followed by one quote
  [^"]*" #  the same again, ensuring an even number
 )*      # any number of times (0, 2, 4 etc. quotes)
 [^"]*   # followed by only non-quotes until
 \Z      # the end of the string.
)        # End of lookahead assertion

If you can have escaped quotes within quotes (a "length: 2\""), it's still possible but will be more complicated:

result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')

This is in essence the same regex as above, only substituting (?:\\.|[^"\\]) for [^"]:

(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^"\\] # any character except backslash or quote
)       # End of alternation
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
10

js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a

subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced

See this live demo

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
  • 1
    Accidentally upvoted. This answer is not correct, as it will also match the quoted parts completely, instead of matching only 'a' characters outside of strings. The accepted answer works as intended. – Toastgeraet Jul 02 '20 at 12:52
0

Full-blown regex solution for regex lover, without caring about performance or code-readability.

This solution assumes that there is no escaping syntax (with escaping syntax, the a in "sbd\"a" is counted as inside the string).

Pseudocode:

processedString = 
    inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
               .replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote

Then you can match the text you want in the processedString. You can remove the 2nd replace if you consider text after the lone quote as outside quote.

EDIT

In Ruby, the regex in the code above would be

/\".*?\"/

used with gsub

and

/\".*/

used with sub


To address the replacement problem, I'm not sure whether this is possible, but it worths trying:

  • Declare a counter
  • Use the regex /(\"|a)/ with gsub, and supply function.
  • In the function, if match is ", then increment counter, and return " as replacement (basically, no change). If match is a check whether the counter is even: if even supply your replacement string; otherwise, just supply whatever is matched.
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • Does this have anything to do with 'a's as mentioned in the OP requirement? – El Ronnoco Jul 16 '12 at 11:24
  • @ElRonnoco: Yes. Instead of doing everything at once, I just remove all the quoted string, and leave only unquoted parts in the `processedString`. Then searching for text will be easy. My solution has assumption, though. – nhahtdh Jul 16 '12 at 11:26
  • My bad, I want to match them first and replace them afterwards. But I want the quoted parts to stay in the string. You are removing the quoted parts and then match all `a`s right? – js-coder Jul 16 '12 at 11:29
  • I think that Ruby doesn't have a `g` flag. – js-coder Jul 16 '12 at 11:31
  • @dotweb: Another solution is to replace `"` with special character that you are sure will not be in the input string, but it is very hacky solution and I wouldn't recommend it. – nhahtdh Jul 16 '12 at 11:44