2

I almost have it but its not working 100%. I would like to strip everything from a string and return only the image URL. If the string has more than one image then it would separate image URL's by a comma ",". I started with this answer and got this far:

Example string (this will change but all I need are the image URL's with a comma delimiter if more than one)

<table border="0" cellpadding="8"><tr><td width="80px"><a href="https://www.ebay.com/itm/Vintage-Elegant-Clear-Glass-Light-Shade-Ceiling-3-holes-Large-Flower/183189652718?hash=item2aa6f444ee:g:ji8AAOSwzpFa20P3"><img border="0" src="https://i.ebayimg.com/thumbs/images/g/ji8AAOSwzpFa20P3/s-l225.jpg"></a></td><td><div><span><strong>$15.00</strong></span></div><div>End Date: <span>May-21 07:03</span></div><div>Buy It Now for only: US $15.00</div><a href="https://www.ebay.com/itm/Vintage-Elegant-Clear-Glass-Light-Shade-Ceiling-3-holes-Large-Flower/183189652718?hash=item2aa6f444ee:g:ji8AAOSwzpFa20P3">Buy it now</a><span> | </span><a href="http://cgi1.ebay.com/ws/eBayISAPI.dll?MfcISAPICommand=MakeTrack&item=183189652718&ssPageName=RSS:B:SHOP:US:104">Add to watch list</a></td></tr></table>

The PHP:

<?php
function getImageUrlFromEbay($content = null) {
    if( !empty($content)){
        $imgSrc = preg_replace("/(<img\\s)[^>]*(src=\\S+)[^>]*(\\/?>)/i", "$1$2$3", $content);  
        return $imgSrc;
    }
}
?>

Here is a preview of what my current function returns: enter image description here

How can I make sure the function only returns the image URL's?

Derek
  • 4,747
  • 7
  • 44
  • 79
  • 1
    you dont use regex to parse html: https://stackoverflow.com/a/1732454/5721385 – DZDomi Apr 23 '18 at 22:13
  • look into simplexml and xpath ;) – DZDomi Apr 23 '18 at 22:15
  • The better question is why you even have an HTML string to being with; a string should never contain a wad of HTML. This sounds like an [**XY problem**](https://meta.stackexchange.com/a/66378), which you're probably approaching the [**wrong way**](https://meta.stackexchange.com/a/233676). What problem are you trying to solve by doing this? What should the end result be? How does **this approach** help you get there? Please provide some **context** surrounding your question to help clarify your **intent**. – Obsidian Age Apr 23 '18 at 22:18
  • @DZDomi Thanks but all examples I found searching Google "php strip all but image url site:stackoverflow.com" had regexp. Not sure why that was cause for the downvote but thanks for the link to further explaination. – Derek Apr 23 '18 at 22:19
  • I didnt add this to the question because I did not feel it was relevant but Im trying to parse an XML feed and the "descrirption" contains HTML but I just need the image URL. Im writing my function in WP ALL Imporrt to return just the image url to import it into WordPress. My questions is regarding how to exract the image url but here are additional context. @ObsidianAge – Derek Apr 23 '18 at 22:21

1 Answers1

2

A rough way to do this with regex (assuming valid HTML) is:

if (preg_match_all('/<img .*?src=[\'"]([^\'"]+)/i', $str, $matches) > 0) {
    $images = implode(',', $matches[1]);
} else {
    $images = '';
}

Returning the array of $matches[1] might work better than a comma separated string. In theory, the URL could contain a comma.

Rather than filtering out the HTML not part of the image src, just match on the src.

drew010
  • 68,777
  • 11
  • 134
  • 162