1

I am working on a project where I am trying to extract a string between two speechmarks anywhere within a string. Its almost working except for one thing.

Lets say I have the line

"04\/06\/2019 17:56:45:\tTook 0 seconds to read lines for log 'Log Agent

The idea I'll do two regex matches, one for a string where it doesn't have an exclamation mark for example I'll do a regex search to match "Took 0 seconds" and another to look for something like !"Took 0 seconds"

I have the following regex to look for a string without it starting with an exclamation mark.

$regex = '/[^.!](["\'])(?:(?=(\\\\?))\2.)*?\1/m';
            $matches = null;
            preg_match_all($regex, $this->searchString, $matches, PREG_SET_ORDER, 0);

But the above regex only matches the string if there's something before the string it doesn't find anything

E.g. if the search string is "Took 0 seconds" it doesn't get found

if its some other content "Took 0 seconds" then it correctly finds the string Took 0 seconds in the regex match.

So basically what I am asking how can I change my regex that it extract the string between the speech marks anywhere, even if its right at the beginning.

UPDATE

Try and clarify what I'm doing. I'm creating a search parser to find certain strings within a database.

The search within the database will either look for individual keywords (not related to this question that easily done) and looking for a particular string within the database. So if my search string "took 0 seconds", then the database would return any rows that contain a record took 0 seconds. If the search string is !"took 0 seconds" then I'd be able to check the database for not contains Took 0 seconds.

If my search string was keyword1 keyword2 "took 0 seconds" keyword 3 then the regex would return "took 0 seconds"

Below is a regex101 link that gives some examples and what the problem is, you'll notice the first one where it is just "Took 0 seconds" on its own doesn't get matched.

Boardy
  • 35,417
  • 104
  • 256
  • 447

1 Answers1

3

As @Toto pointed out in the comments you could use a negative lookbehind instead of matching the characters in a character class.

What you might do is update your pattern to only make use of the first capturing group. As your pattern makes use or a tempered greedy token solution, the pattern might look like:

(?<![.!])(["'])(?:(?!\1).)*\1
  • (?<![.!])
  • (["']) Capture in group 1 either " or '
  • (?:(?!\1).)* Loop 0+ times matching any char while what on the right is not group 1
  • \1 Match backreference to group 1

Regex demo

Note that due to the * quantifier it will also match ""

Another way to get those matches could be to use a non greedy match .*? followed by group 1 \1

(?<![.!])(["\']).*?\1

Regex demo

Toto
  • 89,455
  • 62
  • 89
  • 125
The fourth bird
  • 154,723
  • 16
  • 55
  • 70