0

I have a bunch of strings that may or may not have a substring similar to the following:

<a class="tag" href="http://www.yahoo.com/5"> blah blah ...</a>

Im trying to retrieve the '5' at the end of the link (that isnt necessarily a one digit number, it can be huge). But, this string will vary. The text before the link, and after, will always be different. The only thing that will be the same is the <a class="tag" href="http://www.yahoo.com/ and the closing </a>.

Musa
  • 96,336
  • 17
  • 118
  • 137
Jonah Katz
  • 5,230
  • 16
  • 67
  • 90

4 Answers4

1

Give parse_url() a try. Should be easy from there.

nook
  • 2,378
  • 5
  • 34
  • 54
1

You can do it using preg_match_all and <a class="tag" href="http:\/\/(.*)\/(\d+)"> regular expression.

Prasanth
  • 5,230
  • 2
  • 29
  • 61
0

I would got with "basename":

// prints passwd
print basename("/etc/passwd")

And to get the link you could use:

$xml  = simplexml_load_string( '<a class="tag" href="http://www.yahoo.com/5"> blah blah ...</a>' );
$attr = $xml->attributes();
print $attr['href'];

And finally: If you don't know the whole structure of the string, use this:

$dom = new DOMDocument;
$dom->loadHTML( '<a class="tag" href="http://www.yahoo.com/5"> blah blah ...</a>asasasa<a class="tag" href="http://www.yahoo.com/6"> blah blah ...</a>' );
$nodes = $dom->getElementsByTagName('a');
foreach ($nodes as $node) {
    print $node->getAttribute('href');
    print basename( $node->getAttribute('href') );
}

As this will also fix invalid HTML code.

insertusernamehere
  • 23,204
  • 9
  • 87
  • 126
0

As you only need to retrieve the 5, it's pretty straight forward:

$r = pret_match_all('~\/(\d+)"~', $subject, $matches);

It's then in the first matching group.

If you need more information like the link text, I would suggest you to use a HTML Parser for that:

require('Net/URL2.php');

$doc = new DOMDocument();
$doc->loadHTML('<a class="tag" href="http://www.yahoo.com/5"> blah blah ...</a>');
foreach ($doc->getElementsByTagName('a') as $link)
{
    $url = new Net_URL2($link->getAttribute('href'));
    if ($url->getHost() === 'www.yahoo.com') {
        $path = $url->getPath();
        printf("%s (from %s)\n", basename($path), $url);
    }
}

Example Output:

5 (from http://www.yahoo.com/5)
hakre
  • 193,403
  • 52
  • 435
  • 836
  • But i need to get the link out of the string – Jonah Katz Aug 07 '12 at 22:27
  • In your question you wrote you need to get the 5 so I took you by the word. For the link I suggest a HTML Parser: [Robust, Mature HTML Parser for PHP](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) – hakre Aug 07 '12 at 22:28
  • "But, this string will vary. The text before the link, and after, will always be different" – Jonah Katz Aug 07 '12 at 22:29
  • But appreciate your answer, ill try and take it from there – Jonah Katz Aug 07 '12 at 22:29
  • @JonahKatz: I added a working example with PHP's HTML Parser and Pears [`Net_URL2` component](http://pear.php.net/package/Net_URL2/). – hakre Aug 07 '12 at 22:36