17

I want to match strings like:

The sentence is 'He said "Hello there"'
The sentence is "He said 'Hello there'"

and get back a single capture (match) that is the sentence inside the outer single or double quotes.

^The sentence is (?:(?:'([^']*)')|(?:"([^"]*)"))$

The above regex gives me back 2 captured groups, one of them empty and the other containing the desired sentence.

^The sentence is (['"])(.*)\1$

Returns the quotation mark (single or double quote) as the 1st group and the sentence as the 2nd group.

If I make the first group non-capturing,

^The sentence is (?:['"])(.*)\1$

then I cannot use the later reference to the captured group. (the \1 is, of course, no longer referring to the single or double quote match)

Is there a way to have groups whose "capture" can be referenced later in the regex, but whose captured value is not returned in the list of matches?

Or some other way to solve my (seemingly simple) problem.

Phil Davis
  • 303
  • 1
  • 3
  • 9
  • If you also expect escaped quotes inside, [see this answer](https://stackoverflow.com/a/10786066/5527985). The [technique used](http://www.softec.lu/site/RegularExpressions/UnrollingTheLoop) is even of nice performance. – bobble bubble Nov 20 '19 at 09:39

4 Answers4

15

Very sad, but such an elegant and accurate way does not work:

(["'])(?:\\\1|[^\1]+)*\1

But we can change it a little bit, and all works fine:

(["'])((?:\\\1|(?:(?!\1)).)*)(\1)

https://regex101.com/r/dKdBMT/2

I would like to make sure that this regexp will work in all cases: please more test it.

redisko
  • 559
  • 5
  • 4
13

This one seems to work:

(?:'|").*(?:'|")

or

((?:'|").*(?:'|"))

if you need a group.

Here's the demo: link

It works, because * is a greedy quantifier, so you don't have to know what kind of quote is in the end. * will take as much as possible.

Egan Wolf
  • 3,533
  • 1
  • 14
  • 29
  • 1
    The first example does not actually capture anything. The second example captures the sentence including the outer single or double quotes. A combination works: `(?:'|")(.*)(?:'|")` – Phil Davis Oct 27 '17 at 04:45
  • I thought, you want to capture quotation signs as well, didn't go back to question to check it. I'm glad it helped. – Egan Wolf Oct 27 '17 at 04:49
  • Not knowing whether the first quote is single or double means that unbalanced sentences get matched: The sentence is `"abc 123'` and `'He said "Goodbye"` and the last of those returns `He said "Goodbye` as the sentence. It would be nice to not match such strings with unbalanced quotes. – Phil Davis Oct 27 '17 at 04:53
  • Check this regex `(?:'|")(.*(?:'|").*(?:'|"))(?:'|")` and [demo](https://regex101.com/r/OVdomu/2). It still can not be perfect as I don't know what kind of cases you have, but it should help you. – Egan Wolf Oct 27 '17 at 05:28
  • 2
    No. This will match the regex: 'ss" – tgoneil Mar 06 '18 at 18:24
  • It indeed matches the first one and the second one, however it also matches the last one too... a> The sentence is 'He said "Hello there"' b> The sentence is "He said 'Hello there'" c> The sentence is "He said 'Hello there"' I feel that regex is not the safest way to go here. – Stanislav Jan 05 '21 at 11:40
5

You want to make sure the quote symbols are properly matched, so a quote starting with a single quote ends with a single quote. Also, the regex should allow for escaping a quote symbol with a backslash if it's the same symbol (double or single quote symbol) bounding the string. Try this:

"(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*'

These samples match this regex:

'sing"le q\'uote'

"dou\"ble 'quote"

tgoneil
  • 1,522
  • 3
  • 19
  • 30
1

One of above is very accurate. But, needs some updates. Here it is:

(["'])((?:\\1|(?:(?!\1)).)*)(\1)

This will match everything as string literals.

  • Now, I want to match `{` OR `}` braces except matching pattern `(["'])((?:\\1|(?:(?!\1)).)*)(\1)` in same string. I'm trying but no luck.. – Yogesh Sonawane Dec 20 '20 at 05:27