-2

I've been looking for a way to find the nearest encasing quotes or double quotes of a phrase in a paragraph. For example, for the phrase -> AAAAA:

I am "looking for" a way that doesn't break: "Lorem
ipsum\" AAAAA" in this case. Or this AAAAA case. Or this 'AAAAA' case.
Isn't this annoying?

The output would be:

"Lorem ipsum \" AAAAA"  |  AAAAA  |  'AAAAA'

I'm really looking for any good way to do it (regex/parser or any valid method will be gladly accepted).

I tried to get some inspiration from How can I match a quote-delimited string with a regex?, but it wasn't really what I was looking for.

An example for something I tried was this (and then use code to filter out matches that include the "AAAAA" in them. This failed though when there was another ' in the end of the sentence.:

(["'])(?:\\\1|[\s\S])*?(AAAAA)?(?:\\\1|[\s\S])*?\1|AAAAA

If it's any help, I'm going to be using this solution in Python code.

Thanks!

cakelover
  • 166
  • 1
  • 8
  • 2
    So what is the code you came up with? Please show the code to see what issue you have got. – Wiktor Stribiżew Apr 23 '21 at 08:13
  • Just a hint: you will have to use regex with a bit of code, or a parser. Doing this with just regex is too much pain, and most probably won't work without certain assumptions. – Wiktor Stribiżew Apr 23 '21 at 08:47
  • @WiktorStribiżew added. If there is anything else let me know. – cakelover Apr 25 '21 at 06:12
  • I thought you were looking for something like https://ideone.com/sBQzUc – Wiktor Stribiżew Apr 25 '21 at 12:54
  • That's almost it! Because I was trying to capture the surrounding quotes as well, I moved them inside the capturing group. One thing I noticed is the \b'\b part. Do you think there is any way to bypass it some other way? Imagine text with the phrase - """I used to love the song Space Truckin'. It is 'AAAAA'.""" -> Here the ' is at the end of a word and kinda messes up the solution. Again, thanks a ton! – cakelover Apr 25 '21 at 14:01
  • See https://ideone.com/K5y4sf or https://ideone.com/GLgvi7 – Wiktor Stribiżew Apr 25 '21 at 14:05
  • I actually can't thank you enough, and although I don't understand the regex completely (I guess I'm not proficient enough), this gives me the intended results. You can make this the answer and I'll accept. – cakelover Apr 25 '21 at 14:29

1 Answers1

1

You can use

(?xs)
(?<!')(?:'{2})*\B('\b[^'\\]*(?:(?:\\.|\b'\b)[^'\\]*)*') # Single quoted string literal
|                                                       # or
(?<!")(?:"{2})*\B("\b[^"\\]*(?:(?:\\.|\b"\b)[^"\\]*)*") # Double quoted string literal

See the regex demo. Details:

  • (?xs) - verbose and dotall modes on
  • (?<!') - no ' allowed immediately on the left
  • (?:'{2})* - zero or more '' substrings
  • \B - there must be start of string or a non-word char immediately to the left
  • ('\b[^'\\]*(?:(?:\\.|\b'\b)[^'\\]*)*') - Group 1, a single quoted string literal pattern:
    • '\b - a ' that must be followed with a word char
    • [^'\\]* - zero or more chars other than ' and \
    • (?:(?:\\.|\b'\b)[^'\\]*)* - zero or more repetitions of
      • (?:\\.|\b'\b) - a \ followed with any one char or a ' that is enclosed with word chars
      • [^'\\]* - zero or more chars other than ' and \
    • ' - a ' char.
  • | - or
  • (?<!")(?:"{2})*\B("\b[^"\\]*(?:(?:\\.|\b"\b)[^"\\]*)*") - Group 2: a double quoted string literal (analogous to the preceding single quoted string literal pattern).

See the Python demo:

import re
pattern = re.compile( r'''(?xs)
(?<!')(?:'{2})*\B('\b[^'\\]*(?:(?:\\.|\b'\b)[^'\\]*)*') # Single quoted string literal
|                                                       # or
(?<!")(?:"{2})*\B("\b[^"\\]*(?:(?:\\.|\b"\b)[^"\\]*)*") # Double quoted string literal
''')
 
text = "I am \"looking for\" a way that doesn't break: \"Lorem\nipsum\\\" AAAAA\" in this case. Or this AAAAA case. Or this 'AAAAA' case.\nIsn't this annoying?"
print(f"This is the text: {text}")
matches = [f'{x}{y}' for x,y in pattern.findall(text) if 'AAAAA' in f'{x}{y}']
print(matches)
# => ['"Lorem\nipsum\\" AAAAA"', "'AAAAA'"]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563