1

I'm currently using (['\"])(?:\\1|.*?\\1) to capture group of quotes.

Text: "Hello", is it 'me youre looking for'?
# result: "Hello" (\1) and 'me youre looking for' (\2)

Additionally I want it to ignore escaped quotes inside those groups (or globally, also fine).

Text: "Hello", is it 'me you\'re looking for'?
# result: "Hello" (\1) and 'me you\'re looking for' (\2)

Using python. I'm aware that this questions is somewhat similar. However, I was unable to apply it to my existing regex.

Thanks, regex freaks!

Community
  • 1
  • 1
Aron Woost
  • 19,268
  • 13
  • 43
  • 51

2 Answers2

3

Here's a pattern:

(['"])(?:\\.|.)*?\1

Demo

Everyting lies in the (?:\\.|.) bit:

  • either match an escaped character: \\. - this handles both \" and \\
  • or any other (read: unescaped) character: . - you could also use [^\\] here.

Since the regex engine tries alternations from left to right, it'll try matching an escaped character first.

By the way, in your pattern, \1|.*?\1 was redundant, you could just have written .*?\1.

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
0

You could use the below regex.

(?<!\\)(['"])(?:\\\1|(?!\1).)*\1

DEMO

  • (?<!\\) negative lookbehind which asserts that the match won't be preceeded by a backslash character.

  • (['"]) this would capture unescaped single or double quotes.

  • (?:\\\1|(?!\1).)* , \\\1 this would match an escaped ' or " quotes based on the captured character or any character but not of the captured character, zero or more times.

  • \1 refers the first captured character.

In python you need to alter the re.findall function like below.

>>> def match(s):
        for i in re.findall(r'''(?<!\\)((['"])(?:\\\2|(?!\2).)*\2)''', s):
            print(i[0])


>>> match(r""""Hello", is it 'me you\'re looking for'""")
"Hello"
'me you\'re looking for'
>>> match(r"""Hello\", is it 'me you\'re looking for'""")
'me you\'re looking for'
>>> 
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274