0

I have spent time trying to solve this problem and this is as far as ive got. basically im trying to pull images from rss feeds. i use magpie to process the feeds as shown below.. this snippet is within a class

function getImagesUrl($str) {
    $a = array();
    $pos = 0;
    $topos;
    $init = 1;

    while($init) {
        $pos = strpos($str, "img",  $pos);
        if($pos != FALSE) {
            $topos = strpos($str, ">", $pos);
            $imagetag = substr($str, $pos, ($topos - $pos));
            $url = $this->getImageUrl($imagetag);
            $pos = $topos;
            array_push($a, $url);
        }
        else {
            $init = 0;
        }
    }
    return $a;
}


/*
 * get the full url inside src atribute in <img>
*/
function getImageUrl($image) {
    $p = strpos($image, "src=", 0);
    $p+= 5; // remove o src="
    $tp = strpos($image, '" ', $p);
    $str = substr($image, $p, ($tp - $p));
    return $str;
}                

using the above functions... i call them this way... so far this outputs the data i'll paste later on

            @$rss = fetch_rss($rsso->url);
            if (@$rss)
                {
                $items=$rss->items;
                  foreach ($items as $item ) 
                    {
                    if (isset($item['title'])&&isset($item['description']))
                        {
                    $hash=md5($this->es($item['title']).$this->es($item['description']));
                     $content = $item['content'];
                    foreach($content as $c) {
                        // get the images on content
                        $arr = $this->getImagesUrl($c);
                        print_r($arr);
                        }

here is an example of output

 1. Array ( [0] =>
    http://api.tweetmeme.com/imagebutton.gif?url=http://mashable.com/2010/09/25/trailmeme/
    [1] =>
    http://cdn.mashable.com/wp-content/plugins/wp-digg-this/i/gbuzz-feed.png
    [2] =>
    http://mashable.com/wp-content/plugins/wp-digg-this/i/fb.jpg
    [3] =>
    http://mashable.com/wp-content/plugins/wp-digg-this/i/diggme.png
    [4] =>
    http://ec.mashable.com/wp-content/uploads/2009/01/bizspark2.gif
    [5] =>
    http://cdn.mashable.com/wp-content/uploads/2010/09/web.png
    [6] =>
    http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png
    [7] =>
    http://cdn.mashable.com/wp-content/uploads/2009/02/bizspark.jpg
    [8] =>
    http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/0/di
    [9] => [10] =>
    http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/1/di
    [11] => [12] =>
    http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:D7DqB2pKExk [13] => [14] =>
    http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:V_sGLiPBpWU [15] => [16] =>
    http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:F7zBnMyn0Lo [17] => [18] =>
    http://feeds.feedburner.com/~ff/Mashable?d=qj6IDK7rITs
    [19] => [20] =>
    http://feeds.feedburner.com/~ff/Mashable?d=_e0tkf89iUM
    [21] => [22] =>
    http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:gIN9vFwOqvQ [23] => [24] =>
    http://feeds.feedburner.com/~ff/Mashable?d=yIl2AUoC8zA
    [25] => [26] =>
    http://feeds.feedburner.com/~ff/Mashable?d=P0ZAIrC63Ok
    [27] => [28] =>
    http://feeds.feedburner.com/~ff/Mashable?d=I9og5sOYxJI
    [29] => [30] =>
    http://feeds.feedburner.com/~ff/Mashable?d=CC-BsrAYo0A
    [31] => [32] =>
    http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:_cyp7NeR2Rw [33] => [34] =>
    http://feeds.feedburner.com/~r/Mashable/~4/0N_mvMwPHYk
    )

is there a way i can filter out the correct url for image? for example.... i would like to strip out urls with no extensions of "jpg,png,gif" etc. secondly, i would like to scrap urls with eg bizspark, digg, facebook, tweet, twitter etc. anybody found any easier way of doing this? please help me out

Sir Lojik
  • 1,409
  • 7
  • 24
  • 45

1 Answers1

0

I posted an answer to your related question here: Pulling Images from rss/atom feeds using magpie rss

To apply that answer to your code above, first make the changes to rss_parse.inc as per my previous answer. Then you can simply access the image urls via Magpie (instead of having to write any extra functions) e.g.

// Your code
@$rss = fetch_rss($rsso->url);
if (@$rss)
{
   $items=$rss->items;
   foreach ($items as $item ) 
   {
      if (isset($item['title'])&&isset($item['description']))
      {
         // START MY EDIT
         if (isset($item['enclosure_type']) && isset($item['enclosure_url'])){
            switch ($item['enclosure_type']){
               case "image/gif":
               case "image/jpeg":
               case "image/png":
                   $image_url=$item['enclosure_url'];
                   $image_length=$item['enclosure_length'];
                   break;
            }
         }
         //END MY EDIT
       }
   }
}

And that's it! You just have to use the $image_url var to display your image (in an img tag of course :-)

I have only checked for jpg, gif and png images in the code above as they're the most popular, but you can add other mime-types to the switch if you need to. Just be aware that the enclosure type is set by the creator of the RSS feed and not read from the file, so it may not be accurate. You might want to use exif_imagetype() on the image file itself to ensure it actually is an image.

Hope this helps if its not too late!

Community
  • 1
  • 1
FluffyKitten
  • 13,824
  • 10
  • 39
  • 52