1

Hi i'm parsing an XML file using PHP to create another XML file in a nicer format which I am eventually going to use to populate an unordered HTML list.

But the XML feed has duplicate entries, and thus my formatted output also has duplicate entries. How can i loop through the feed and remove the duplicates somehow? Using PHP if possible. I'm a bit of a newbie and am not sure what to do with this one.

Here is a typical output (my formatted XML with duplicates):

    <films>
    <film>
    <filmtitle>Death Race 2</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=377029</filmlink>
    </film>

    <film>
    <filmtitle>Death Race 2</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=377029</filmlink>
    </film>

    <film>
    <filmtitle>Shattered Glass</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=UKIC48</filmlink
    </film>

    <film>
    <filmtitle>Shattered Glass</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=UKIC48</filmlink>
    </film>

    <film>
    <filmtitle>The Brothers Bloom</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=380196</filmlink>
    </film>

    <film>
    <filmtitle>The Brothers Bloom</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=380196</filmlink>
    </film>

...and so on...

Any help would be great. Thanks.

UPDATE:

I have defined an array before looping through the feed like this:

$filmList = array();

When looping throughout the list I have added entries using:

array_push($filmsForList, array("filmTitle" => $title, "pictureLink" => $pictureLink);

where $filmTitle and $filmLink are the values from the parsed XML. How would I remove duplicates from that? Or stop them entering in the first place?

Thanks...

Adam Waite
  • 19,175
  • 22
  • 126
  • 148

2 Answers2

6

Try this:

<?php
$str=<<<'EOT'
    <films>
    <film>
    <filmtitle>Death Race 2</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=377029</filmlink>
    </film>

    <film>
    <filmtitle>Death Race 2</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=377029</filmlink>
    </film>

    <film>
    <filmtitle>Shattered Glass</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=UKIC48</filmlink>
    </film>

    <film>
    <filmtitle>Shattered Glass</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=UKIC48</filmlink>
    </film>

    <film>
    <filmtitle>The Brothers Bloom</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=380196</filmlink>
    </film>

    <film>
    <filmtitle>The Brothers Bloom</filmtitle>
    <filmlink>http://www.picturebox.tv/watchnow?id=380196</filmlink>
    </film>
    </films>
EOT;

$xml=simplexml_load_string($str);

$seen=array();

$len=$xml->film->count();
for($i=0;$i<$len;$i++){
    $key=(string) $xml->film[$i]->filmlink;
    if (isset($seen[$key])) {
        unset($xml->film[$i]);
        $len--;
        $i--;
    }else{
        $seen[$key]=1;
    }
}

echo $xml->asXML();

?>

this clears duplicates by filmlink

stewe
  • 41,820
  • 13
  • 79
  • 75
1

Just put those pairs in an array, use title as key, link as value. You would simply override duplicates when inserting into the array.

See this question for a discussion about Java hashmaps and PHP arrays.

Edit:

Something like this:

$a = array("one" => "one_link", "two" => "two_link", "one" => "one_link");

$target = array();

foreach ($a as $key => $value)
   $target[$key] = $value;

This will get you:

array("one" => "one_link", "two" => "two_link")

With this setup, there is no need to check if the key already exists.

Community
  • 1
  • 1
TPete
  • 2,049
  • 4
  • 24
  • 26
  • @AdamWaite You could check if a value already exists in the array, before entering it by using `array_key_exists`. – TPete Mar 29 '12 at 09:27