0

Say I have the following string:

<a name="anchor" title="anchor title">

Currently I can extract name and title with strpos and substr, but I want to do it right. How can I do this with regex? And what if I wanted to extract from many of these tags within a block of text?

I've tried this regex:

/name="([A-Z,a-z])\w+/g

But it gets the name=" part as well, I just want the value.

NotaGuruAtAll
  • 503
  • 7
  • 19

2 Answers2

0

The regex (\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']? can be used to extract all attributes

Brian H
  • 1,033
  • 2
  • 9
  • 28
0

DOMDocument example:

<?php
$titles = array();
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test<br><a name="anchor" title="anchor title"></body></html>");
$links = $doc->getElementsByTagName('a');
if ($links->length!=0) {
    foreach ($links as $a) {
        $titles[] = $a->getAttribute('title');
    }
}
?>

You commented: "I'm actually parsing the data before the page is rendered so DOM is not possible, right?"

We're working with the scraped HTML, so we construct a DOM with these functions and parse like XML.

Good examples in the comments here: http://php.net/manual/en/domdocument.getelementsbytagname.php

Twisty
  • 30,304
  • 2
  • 26
  • 45