0

I'm looking for a way to transform this:

...<a href="showinfo:3875//[integer]">[inner content]</a>...

Into this:

...<a href="http://somelink.com/[inner content]">[inner content]</a>...

The context has multiple links a with other showinfo:[integer] values. (I can process those ones)

Thanks for any help, Bálint

Edit: Thanks to Kaiser's answer, here is the working snippet:

$html = $a;

$dom = new \DOMDocument;
@$dom->loadHTML( $html ); //Cannot guarantee all-valid input

foreach ($dom->getElementsByTagName('a') as $tag) {
    // Fixed strstr order and added a != false check - the, because the string started with the substring
    if ($tag->hasAttribute('href') && strstr($tag->getAttribute('href'), 'showinfo:3875') != false) {
        $tag->setAttribute( 'href', "http://somelink.com/{$tag->textContent}");
        // Assign the Converted HTML, prevents failing when saving
        $html = $tag;
    }
}
return $dom->saveHTML( $dom);
}
molbal
  • 974
  • 1
  • 10
  • 23

1 Answers1

1

You can use DOMDocument for a pretty reliable and fast way to handle DOM nodes and their attributes, etc. Hint: Much faster and more reliable than (most) Regex.

// Your original HTML
$html = '<a href="showinfo:3875//[integer]">[inner content]</a>';

$dom = new \DOMDocument;
$dom->loadHTML( $html );

Now that you have your DOM ready, you can use either the DOMDocument methods or DOMXPath to search through it and obtain your target element.

Example with XPath:

$xpath = new DOMXpath( $dom );
// Alter the query to your needs
$el = $xpath->query( "/html/body/a[href='showinfo:']" );

or for example by ID with the DOMDocument methods:

// Check what we got so we have something to compare
var_dump( 'BEFORE', $html );

foreach ( $dom->getElementsByTagName( 'a' ) as $tag )
{
    if (
        $tag->hasAttribute( 'href' )
        and stristr( $tag->getAttribute( 'href' ), 'showinfo:3875' )
        )
    {
        $tag->setAttribute( 'href', "http://somelink.com/{$tag->textContent}" );

        // Assign the Converted HTML, prevents failing when saving
        $html = $tag;
    }
}

// Now Save Our Converted HTML;
$html = $dom->saveHTML( $html);

// Check if it worked:
var_dump( 'AFTER', $html );

It's as easy as that.

kaiser
  • 21,817
  • 17
  • 90
  • 110
  • Edit: added the final solution in the question – molbal Jan 17 '15 at 19:09
  • @molbal Some notes about your edited question: You may want to use `stristr()` instead (see the edit to the answer). You also don't need to check for `!= false`. Omitting it is the same. But if you do, at least make a typesafe check with `!==`. If you **get warnings**, then please follow [this answer](http://stackoverflow.com/questions/1148928/disable-warnings-when-loading-non-well-formed-html-by-domdocument-php/17559716#17559716) to see how to suppress the warnings. Or just _fix_ your HTML if you are in control of it :) – kaiser Jan 17 '15 at 22:11
  • 1
    Thanks for the follow-up! I've switched to stristr, and will do a typesafe check. Unfortunately the HTML is coming from an API source (that does not really follow any markups, it's just HTML-like) which I can not control. The best I can do is to convert those links to reach out for other 3rd party software. – molbal Jan 17 '15 at 23:08