1

I might be asking repeated question but I am not able to find solution to my problem so please spare! I need to capture phrases enclosed in quotes through regex. That is easy , but problem arises when there is no uniformity of quotes like in the following case 'सीक्रेट सुपरस्टार' and ‘ डॉन 2 ’ I tried using re.findall(r"['(.*?)' |‘(.*?)’] ",text) . But this doesn't work out. I need one regex to find phrases enclosed in different type of quotes.

Seema Mudgil
  • 365
  • 1
  • 7
  • 15

1 Answers1

1

You may use

(?:(')|(‘))(.*?)(?(1)'|(?(2)’))

See the regex demo.

Details

  • (?:(')|(‘)) - match and capture ' (put it into Group 1) or match and capture (and put it into Group 2)
  • (.*?) - match any 0+ chars other than line break chars, as few as possible
  • (?(1)' - if Group 1 matched, match '
  • | - else
  • (?(2)’ - if Group2 matched, match
  • )) - end of conditional construct.

See the Python 2.7 demo below:

rx = ur'''(?:(')|(‘))(.*?)(?(1)'|(?(2)’))'''
s=u"'सीक्रेट सुपरस्टार' and ‘ डॉन 2 ’"
for x in re.finditer(rx, s):
    print(x.group(3).encode("utf8"))

Output:

सीक्रेट सुपरस्टार
 डॉन 2 
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    thanks for the answer. But i need to add more conditions to check for the phrases like text enclosed in " सुपरस्टार " or some other type of quotes. With above solution I am able to capture only 2 condition . IS there a way to include multiple conditions? – Seema Mudgil Aug 16 '17 at 06:57
  • Yes, just add more capturing groups in the first `(?:...)` group as alternatives, add more checks to the conditional construct at the end. You might also try another way of matching the strings, like `["'‘](.*?)["'’]`. See [this Python demo](https://ideone.com/GKXCjL). Or even [`['"‘]([^'"‘]*)['"’]`](https://ideone.com/Y6Dazl). Check these regexes [**here**](https://regex101.com/r/5D4SpO/1). – Wiktor Stribiżew Aug 16 '17 at 07:02