1

EDIT: I'm not parsing html like the 5 billion other questions that have been posted. This is raw unformatted text that I want to convert into some HTML.

I'm working on a post processing. I need to convert Urls with image endings (jpe?g|png|gif) into image tags, and all other Urls into href links. I have my image replacement correct, however I'm stuck keeping the link replacement from trying to overwrite one another.

I need help with the expression within how to get it to looked for urls without the tags in place from the image replace, or look for urls that do not end in dot jpe?g|png|gif.

public function smartConvertPost($post) {

    /**
     * Match image based urls
     */
    $pattern = '!http://([a-z0-9\-\.\/\_]+\.(?:jpe?g|png|gif))!Ui';
    $replace='<p><img src="http://$1"></p>';
    $postImages = preg_replace($pattern,$replace,$post);

    /**
     * Match url based
     */
    $pattern='/http://([a-z0-9\-\.\/\_]+(?:\S|$))/i';
    $replace='<a href="$1">$1</a>';
    $postUrl = preg_replace($pattern,$replace, $postImages);

return $postUrl;
}

Please note I am not talking about matching tags or html. matching a string like so and converting it to html.

If this was an example post with a Url to a page like http://www.some-website.com/some-page/anything.html and I also put a url to an image http://www.some-website.com/someimage.jpg you would need to regex the two to be a hyperlink and an image. 

Thanks,

LeviXC
  • 1,075
  • 2
  • 15
  • 32
  • 2
    I'm 90% positive the links to the right under "Related" have this answered 4,5 or maybe 6 times over. Please take a moment to browse the questions listed after you type yours before posting. – Brad Christie Mar 21 '11 at 14:33
  • @Bras 4,5, or maybe 6 ? I'd say many more. – Clement Herreman Mar 21 '11 at 14:34
  • @ClementHerreman: More or less just referencing within that haystack. ;-) If you did a google `site:stackoverflow.com url to img anchor`, you'd be guaranteed hundreds/thousands. ;-) – Brad Christie Mar 21 '11 at 14:35
  • No because they are related to html tags. we are talking strings like this post except it would look more like this: – LeviXC Mar 21 '11 at 14:40
  • Example with a http://link-to-someplace.com/anything/ and then a image link like so http://link-to-some-images.com/imagename.jpg. You need to regex that and make the Urls href links and the images urls image tags. – LeviXC Mar 21 '11 at 14:42
  • @Levi: HTML is HTML, even when it's short like a comment, or just part of a page. Regex just can't handle all the cases : ill forme html, HTML 4, and many more hurdles. – Clement Herreman Mar 21 '11 at 14:43

3 Answers3

3

Brad Christie's preg_replace_callback() recommendation is a good one. Here is one possible implementation:

function smartConvertPost($post)
{ // Disclaimer: This "URL plucking" regex is far from ideal.
    $pattern = '!http://[a-z0-9\-._~\!$&\'()*+,;=:/?#[\]@%]+!i';
    $replace='_handle_URL_callback';
    return preg_replace_callback($pattern,$replace, $post);
}

function _handle_URL_callback($matches)
{ // preg_replace_callback() is passed one parameter: $matches.
    if (preg_match('/\.(?:jpe?g|png|gif)(?:$|[?#])/', $matches[0]))
    { // This is an image if path ends in .GIF, .PNG, .JPG or .JPEG.
        return '<p><img src="'. $matches[0] .'"></p>';
    } // Otherwise handle as NOT an image.
    return '<a href="'. $matches[0] .'">'. $matches[0] .'</a>';
}

Note that the regex used to pluck out a URL is not ideal. To do it right is tricky. See the following resources:

Edit: Added ability to recognize image URLs having a query or fragment.

ridgerunner
  • 33,777
  • 5
  • 57
  • 69
1

Since it's the 215247th post on that kind of topic, let's say it again : HTML is too complicated to use regex. Use a parser. See this. Regular expression for parsing links from a webpage?

PS: no offense =).

Edit:

I personnaly often user symfony, and there's a really great parser for what you need : http://fabien.potencier.org/article/42/parsing-xml-documents-with-css-selectors

You can get all images using simple css expression on your html. Give it a try.

Community
  • 1
  • 1
Clement Herreman
  • 10,274
  • 4
  • 35
  • 57
  • None taken, but that's not what I asked. I'm not parsing webpages. I'm parsing raw strings with no tags that I want to convert into tags. – LeviXC Mar 21 '11 at 14:46
  • @Levi: This has been done before, that's the basic premise of the responses. It's been done in javascript, php, html, perl, --you name it. Basically, pass a string to the regex engine (I'd recommend [preg_replace_callback](http://www.php.net/manual/en/function.preg-replace-callback.php)) and have it return all URL matches (I'll let you decide the pattern). Then, in the callback decide if it's "image-worthy" or "anchor-worthy" and return the newly-formatted result. – Brad Christie Mar 21 '11 at 14:51
0

What about using a marker ?


public function smartConvertPost($post) {
    $MY_MARKER="<MYMARKER>"; // Define the marker here

    /**
     * Match image based urls
     */
    $pattern = '!http://([a-z0-9\-\.\/\_]+\.(?:jpe?g|png|gif))!Ui';
    $replace='<p><img src="$MY_MARKERhttp://$1$MY_MARKER"></p>'; // Use it here...
    $postImages = preg_replace($pattern,$replace,$post);

    /**
     * Match url based
     */
    $pattern='/(?<!$MY_MARKER)http://([a-z0-9\-\.\/\_]+(?:\S|$))(?!$MY_MARKER)/i';//...here
    $replace='<a href="$1">$1</a>';
    $postUrl = preg_replace($pattern,$replace, $postImages);


    /**
     * Remove all markers
     */
    $postUrl = str_replace( $MY_MARKER, '', $postUrl);

    return $postUrl;
}

Try to choose a marker that will have no chance to aapear in the post. HTH

Stephan
  • 41,764
  • 65
  • 238
  • 329