1

I need to get value of all "title" attributes in html page. I use twig templates so source code can be like:

<a href="#" title="some {% func "smth" %} text">

I use this code to get title value:

/<[a-z]+[^>]*\s+(title|alt)\s*=\s*("[^"]*")/ 

but when title has {% func "smth" %} i get next string:

"some {% func "

how to get full string ?

Update: DOM isnt a solution because it will interprete the example link above as

<a href="#" title="some {% func " smth text></a>
Gordon
  • 312,688
  • 75
  • 539
  • 559
samrockon
  • 913
  • 3
  • 12
  • 17
  • 1
    a lot of parsing html with regex questions these days... – Phil Jul 25 '11 at 12:30
  • Another [obligatory link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) in less than 12 hours. Is it Regex University graduation day again? – Kerrek SB Jul 25 '11 at 12:49
  • 1
    @Kerrek The Accepted Solution in that link is wrong. [Regex can parse HTML](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491). Since we have parsers readily available you dont want to go through that effort though. Also, the OP isnt parsing HTML but Twig Templates which cannot be parsed by DOM. – Gordon Jul 25 '11 at 12:50
  • @Gordon: fair enough - it all depends on what the OP means by "the code can be *like*". If it's exactly and always that particular snippet, then sure, go ahead and regex. – Kerrek SB Jul 25 '11 at 12:52
  • i do not parse html with regex, i just need to get value of html attribute, and i think it is what i can do with regex, only question how to get value from both title="title" and title="title {% func "val" %}" – samrockon Jul 27 '11 at 07:53

1 Answers1

1

This seems to work for me:

/<[a-z]+[^>]*\s+(title|alt)\s*=\s*(".*")/ 

The problem was that the [^"] blocks any quotes, such as "smth". The closing " in your regex will find the end of your string just fine.

kevlar1818
  • 3,055
  • 6
  • 29
  • 43
  • But as it has been said quite clearly and forcibly above, you might not want to use to regex to parse HTML. – kevlar1818 Jul 25 '11 at 16:10