I've been trying for a few days now to write a regex that will capture sentences that start with a particular string, and end with a dissallowed character (<). This sentence may contain any punctuation (off the top of my head []()-,.!?\/
) and most importantly '
and "
, however always will end and start with the same thing (<). So my regex is as follows:
"starting string foo (?:[a-zA-z0-9_]|[-,.!?()\[\]\'\"\/]|[\s])+"
This works fine, gets all sentences starting with "starting string foo" and ends with the < after. It successfully gets sentences with every piece of punctuation.... except double quotes ("). I don't understand why this is the case when it can easily get single quotes (') and other punctuation eg. slashes and dashes.
for example- of the string
starting string foo Hubble revisits the famous "pillars of creation" with a new lens <
it only captures
starting string foo Hubble revisits the famous
but strings like
starting string foo Buzz Aldrin's self-portrait during Gemini 12 with the Earth reflecting off his visor, 12 November 1966 [2651x2632] <
with all kinds of punctuation (' - [ ,) it captures all that i want-
starting string foo Buzz Aldrin's self-portrait during Gemini 12 with the Earth reflecting off his visor, 12 November 1966 [2651x2632]