0

Related question: How can I use regex to match a character (') when not following a specific character (?)?

I'm parsing a log using regex (PHP PCRE library), and trying to extract a URL from it. The URL is encapsulated in double quotes ", but some of the requests also include a double quote ". For example:

"https://www.amh.net.au/online/dbSearch.php?t=all&q=\"Rosuvastatin\""

My first pattern was basically:

#\"([^\"]*)\"#

This worked well, until I reached one of the entries as above, and it truncated the match so all I got was:

https://www.amh.net.au/online/dbSearch.php?t=all&q=\

After digging around, and rediscovering the cheatsheets for regex at http://addedbytes.com and also some more useful information at http://www.regular-expressions.info/lookaround.html I have now tried the following look-behind:

#"([(?<!\\)"]*)"#

But, now all I get is "" and then an empty string

Community
  • 1
  • 1
HorusKol
  • 8,375
  • 10
  • 51
  • 92

2 Answers2

2

You placed your lookbehind INSIDE your group ([]), so it's not interpreted as such, but rather just you say you only want those individual characters.
Basically, I think you'd like something like this:

#"(?:[^"]|(?<=\\)")"#

Though you should be aware that you'd be trolled by \\" for example.

Loamhoof
  • 8,293
  • 27
  • 30
1

The URLs in the logs would be URL-encoded. As such, the following pattern should work:

#\"([^ ]*)\"#
devnull
  • 118,548
  • 33
  • 236
  • 227