1

What I want to do is to get the attribute value from a simple text I'm parsing. I want to be able to contain HTML as well inside the quotes, so that's what got me stalling right now.

$line = 'attribute = "<p class=\"qwerty\">Hello World</p>" attribute2 = "value2"'



I've gotten to the point (substring) where I'm getting the value

$line = '"<p class=\"qwerty\">Hello World</p>" attribute2 = "value2"'

My current regex works if there are no escaped quotes inside the text. However, when I try to escape the HTML quotes, it doesn't work at all. Also, using .* is going to the end of the second attribute.

What I'm trying to obtain from the string above is

$result = '<p class=\"qwerty\">Hello World</p>'



This is how far I've gotten with my trial and error regex-ing.

$value_regex = "/^\"(.+?)\"/"

if (preg_match($value_regex, $line, $matches)) 
     $result = $matches[1];

Thank you very much in advance!

Grozav Alex Ioan
  • 1,559
  • 3
  • 18
  • 26

1 Answers1

0

You can use negative lookbehind to avoid matching escaped quotes:

(?<!\\)"(.+?)(?<!\\)"

RegEx Demo

Here (?<!\\) is negative lookbehind that will avoid matching \".

However I would caution you on using regex to parse HTML, better to use DOM for that.


PHP Code:

$value_regex = '~(?<!\\\\)"(.+?)(?<!\\\\)"~';
if (preg_match($value_regex, $line, $matches)) 
     $result = $matches[1];
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Can you update your answer to set the regex to a variable? I'm getting an error and I think I might've escaped the wrong way. Thank you! – Grozav Alex Ioan Oct 31 '14 at 14:06
  • This solution does not work reliably. What if the last char in the string is an escaped escape? e.g. `'stuff "string\\" string'`. Note also that this question has been asked and answered before see: [PHP: Regex to ignore escaped quotes within quotes](http://stackoverflow.com/a/5696141/433790) – ridgerunner Oct 31 '14 at 14:26
  • For those cases regex would be: `(?<!(?<!\\)\\)"(.+?)(?<!(?<!\\)\\)"` as in [this demo](http://regex101.com/r/dH3xY0/3) – anubhava Oct 31 '14 at 14:44
  • @ridgerunner I didn't consider the case you mentioned. Thank you for pointing that out! – Grozav Alex Ioan Nov 02 '14 at 15:21
  • 1
    @anubhava Once again, thank you! Really happy for the help! I've managed to get my lexer to be completely working! :) – Grozav Alex Ioan Nov 02 '14 at 15:21