1

Hi all i have an array shown below

Array
(
    [0] => http://api.tweetmeme.com/imagebutton.gif?url=http://mashable.com/2010/09/25/trailmeme/ 
    [1] => http://cdn.mashable.com/wp-content/plugins/wp-digg-this/i/gbuzz-feed.png 
    [2] => http://mashable.com/wp-content/plugins/wp-digg-this/i/fb.jpg 
    [3] => http://mashable.com/wp-content/plugins/wp-digg-this/i/diggme.png 
    [4] => http://ec.mashable.com/wp-content/uploads/2009/01/bizspark2.gif 
    [5] => http://cdn.mashable.com/wp-content/uploads/2010/09/web.png 
    [6] => http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png 
    [7] => http://cdn.mashable.com/wp-content/uploads/2009/02/bizspark.jpg 
    [8] => http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/0/di 
    [9] => 
    [10] => http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/1/di 
    [11] => 
    [12] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:D7DqB2pKExk 
    [13] => 
    [14] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:V_sGLiPBpWU 
    [15] => 
    [16] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:F7zBnMyn0Lo 
    [17] => 
    [18] => http://feeds.feedburner.com/~ff/Mashable?d=qj6IDK7rITs 
    [19] => 
    [20] => http://feeds.feedburner.com/~ff/Mashable?d=_e0tkf89iUM 
    [21] => 
    [22] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:gIN9vFwOqvQ 
    [23] => 
    [24] => http://feeds.feedburner.com/~ff/Mashable?d=yIl2AUoC8zA 
    [25] => 
    [26] => http://feeds.feedburner.com/~ff/Mashable?d=P0ZAIrC63Ok 
    [27] => 
    [28] => http://feeds.feedburner.com/~ff/Mashable?d=I9og5sOYxJI 
    [29] => 
    [30] => http://feeds.feedburner.com/~ff/Mashable?d=CC-BsrAYo0A 
    [31] => 
    [32] => http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:_cyp7NeR2Rw 
    [33] => 
    [34] => http://feeds.feedburner.com/~r/Mashable/~4/0N_mvMwPHYk
)

basically, i want to

  1. remove every empty array element
  2. remove every array item without extensions ".jpg,.png,.gif" in its name;
  3. finally remove array items containing keywords such as "digg,fb,tweet,bizspark".

have tried ur code and it returns eg hi, ive tried the above code... it returns an array containing the stuff i want out.

hi, ive tried the above code... it returns an array containing the stuff i want out. )

Array ( [5] =>
http://feedads.g.doubleclick.net/~at/W-z_kHMi30EtE1mpxK8NvMmNmeg/0/di
[7] =>
http://feedads.g.doubleclick.net/~at/W-z_kHMi30EtE1mpxK8NvMmNmeg/1/di
[9] =>
http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:D7DqB2pKExk
[11] =>
http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:V_sGLiPBpWU
[13] =>
http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:F7zBnMyn0Lo
[15] =>
http://feeds.feedburner.com/~ff/Mashable?d=qj6IDK7rITs
[17] =>
http://feeds.feedburner.com/~ff/Mashable?d=_e0tkf89iUM
[19] =>
http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:gIN9vFwOqvQ
[21] =>
http://feeds.feedburner.com/~ff/Mashable?d=yIl2AUoC8zA
[23] =>
http://feeds.feedburner.com/~ff/Mashable?d=P0ZAIrC63Ok
[25] =>
http://feeds.feedburner.com/~ff/Mashable?d=I9og5sOYxJI
[27] =>
http://feeds.feedburner.com/~ff/Mashable?d=CC-BsrAYo0A
[29] =>
http://feeds.feedburner.com/~ff/Mashable?i=mEedXAp78pg:339cIishd6A:_cyp7NeR2Rw
[31] =>
http://feeds.feedburner.com/~r/Mashable/~4/mEedXAp78pg
))

)

i would like it to return eg from first example

[5] => http://cdn.mashable.com/wp-content/uploads/2010/09/web.png 
    [6] => http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png 

any ideas?


Hi GZIp i have modified the code and im getting better results

function url_array_filter($url)
{
    static $words = array('digg', 'fb', 'tweet', 'bizspark','feedburner','feedads','CountImage');
    static $extens = array('.jpg', '.png', '.gif');
    $ret = true;
    if (!$url) {
        $ret = false;
    } elseif (str_replace($words, '', $url) != $url) {
        $ret = false;
    } else {
        $path = parse_url($url, PHP_URL_PATH);
        if (in_array(substr($path, -4), $extens)) {
            $ret = false;
        }
    }
    return $ret;
} 

my problem now comes with the output. eg

Array ( [0] => http://cdn.dzone.com/images/thumbs/120x90/491551.jpg' style='width:120;height:90;float:left;vertical-align:top;border:1px solid ) 

Array ( [0] => http://cdn.dzone.com/images/thumbs/120x90/490913.jpg' style='width:120;height:90;float:left;vertical-align:top;border:1px solid ) 

i want the url only. i think i have the problem with extracting urls from original content. lemme post a link to the origial question and what im doing.

RSS Feeds and image extraction indepth

i simply want the url. i think from that link.... getImagesUrl() maybe messing up. im going to try and use parse_url to bring back the correct url. lemme know if im on right track. im very close to manage pulling image urls from rss feeds parsed with magpie


Ok GZip, this is the modification and addition ive added to ur code... 95% works!! great. although i do receive some funny results im posting below

function url_array_filter($url)
{
    static $words = array('digg', 'fb', 'tweet', 'bizspark','feedburner','feedads','CountImage','fuelbrand');
    static $extens = array('.jpg', '.png', '.gif');
    $ret = true;
    if (!$url) {
        $ret = false;
    } elseif (str_replace($words, '', $url) != $url) {
        $ret = false;
    } else {
        $path = parse_url($url, PHP_URL_PATH);
        if (in_array(substr($path, -4), $extens)) {
            $ret = false;
        }
    }
    return $ret;
} 

function cleanURL($a_url)
    {
    $ret=array();
    foreach ($a_url as $c)
        {
        $a=parse_url($c, PHP_URL_SCHEME).'://'.parse_url($c, PHP_URL_HOST).parse_url($c, PHP_URL_PATH);    
        $a=explode("'",$a);
        $ret[]=$a[0];
        }
    return $ret;         
    }

example usage. $this->getImagesUrl($c); below returns results in first question.

                    foreach($content as $c) {
                        // get the images in content
                        $arr = $this->getImagesUrl($c);
                        $arr = array_filter($arr, 'url_array_filter');
                        }
                    $ret=cleanURL($arr);
                    if (count($ret)>0)
                        {
                        print_r($ret);                                
                        echo "<br/><br/>";
                        }

up to this point almost everything works great but i keep getting some bad results like

Array ( [0] => http://cdn.mashable.com/wp-content/uploads/2010/02/ipad-side- )
Array ( [0] => http://mrg.bz/FZtr2k [1] => http://mrg.bz/IDkx4w ) 

people we almost there... any ideas

Community
  • 1
  • 1
Sir Lojik
  • 1,409
  • 7
  • 24
  • 45

2 Answers2

7

Using, e.g., array_filter() will give you flexibility and ease of maintenance (changing requirements, de-bugging, etc.):

function url_array_filter($url)
{
    static $words = array('digg', 'fb', 'tweet', 'bizspark');
    static $extens = array('.jpg', '.png', '.gif');
    $ret = true;
    if (!$url) {
        $ret = false;
    } elseif (str_replace($words, '', $url) != $url) {
        $ret = false;
    } else {
        $path = parse_url($url, PHP_URL_PATH);
        if (in_array(substr($path, -4), $extens)) {
            $ret = false;
        }
    }
    return $ret;
}

$arr = array_filter($arr, 'url_array_filter');
print_r($arr);

(Works for the array given, but may need changes; it's demo code.)

GZipp
  • 5,386
  • 1
  • 22
  • 18
  • 1
    Changing substr($path, -4) to strrchr($path, '.') will get rid of the integer constant. – GZipp Sep 25 '10 at 20:49
4
foreach ($array as $key => $value) {
    if (
        empty($value)||
        (preg_match('#^http:\/\/(.*)\.(gif|png|jpg)$#i', $value) == 0)||
        (preg_match('#(tweet|bizspark)#i', $value) > 0)
    ) {
        unset($array[$key]);
    }
}
Sergey Eremin
  • 10,994
  • 2
  • 38
  • 44