Regex to get string between speechmarks anywhere within a string

Question

I am working on a project where I am trying to extract a string between two speechmarks anywhere within a string. Its almost working except for one thing.

Lets say I have the line

"04\/06\/2019 17:56:45:\tTook 0 seconds to read lines for log 'Log Agent

The idea I'll do two regex matches, one for a string where it doesn't have an exclamation mark for example I'll do a regex search to match "Took 0 seconds" and another to look for something like !"Took 0 seconds"

I have the following regex to look for a string without it starting with an exclamation mark.

$regex = '/[^.!](["\'])(?:(?=(\\\\?))\2.)*?\1/m';
            $matches = null;
            preg_match_all($regex, $this->searchString, $matches, PREG_SET_ORDER, 0);

But the above regex only matches the string if there's something before the string it doesn't find anything

E.g. if the search string is "Took 0 seconds" it doesn't get found

if its some other content "Took 0 seconds" then it correctly finds the string Took 0 seconds in the regex match.

So basically what I am asking how can I change my regex that it extract the string between the speech marks anywhere, even if its right at the beginning.

UPDATE

Try and clarify what I'm doing. I'm creating a search parser to find certain strings within a database.

The search within the database will either look for individual keywords (not related to this question that easily done) and looking for a particular string within the database. So if my search string "took 0 seconds", then the database would return any rows that contain a record took 0 seconds. If the search string is !"took 0 seconds" then I'd be able to check the database for not contains Took 0 seconds.

If my search string was keyword1 keyword2 "took 0 seconds" keyword 3 then the regex would return "took 0 seconds"

Below is a regex101 link that gives some examples and what the problem is, you'll notice the first one where it is just "Took 0 seconds" on its own doesn't get matched.

What are _speechmarks_? If you mean something like quotes, then your example doesn't seem to have anything to do with those. — AbraCadaver, Jun 04 '19 at 17:08
I've updated my question. Hopefully I've clarified a bit what I am trying to do — Boardy, Jun 04 '19 at 17:16
Not sure I well understand but how about a negative lookbehind: `(?<![.!])(["\'])(?:(?=(\\\\?))\2.)*?\1` — Toto, Jun 04 '19 at 17:24
What if you just use this REGEX instead `(took [0-9]+ seconds)` ? See here : https://regex101.com/r/qYKqT0/4 — tcj, Jun 04 '19 at 17:27
Did you mean to use a tempered greedy token for only group 1 perhaps? `(?<![.!])(["\'])(?:(?!\1).)*\1` https://regex101.com/r/U1IC6u/2 — The fourth bird, Jun 04 '19 at 17:32
@tcj That was just an example, the "took 0 seconds" was just an example, it could be absolutely anything. — Boardy, Jun 04 '19 at 17:49

score 3 · Accepted Answer · edited Jun 04 '19 at 18:14

As @Toto pointed out in the comments you could use a negative lookbehind instead of matching the characters in a character class.

What you might do is update your pattern to only make use of the first capturing group. As your pattern makes use or a tempered greedy token solution, the pattern might look like:

(?<![.!])(["'])(?:(?!\1).)*\1

(?<![.!])
(["']) Capture in group 1 either " or '
(?:(?!\1).)* Loop 0+ times matching any char while what on the right is not group 1
\1 Match backreference to group 1

Regex demo

Note that due to the * quantifier it will also match ""

Another way to get those matches could be to use a non greedy match .*? followed by group 1 \1

(?<![.!])(["\']).*?\1

Regex demo

Regex to get string between speechmarks anywhere within a string

1 Answers1