2

I've read and tried applying pretty much all the remove duplicate values from nested arrays threads that are out there, and I believe this problem is slightly unique in that I am trying to remove entire duplicate branches from a (very) large multidimensional array. I guess this is more a remove duplicate arrays from an array type of question?

I have a dump here on Pastebin to look at. I am trying to use a protected method I'm calling superUnique to kern out the dupes but it is not working (displayed below). What am I doing wrong?

/**
 * @param $array
 * @param bool $preserveKeys
 * @param array $hashes
 * @return array
 */
protected function superUnique($array, $preserveKeys = false, $hashes = array())
{
    $uniqueArray = array();

    foreach ($array AS $key => $value)
    {
        if (TRUE === is_array($value))
        {
            $hash = md5(serialize($value));

            if (FALSE === isset($hashes[$hash]))
            {
                $hashes[$hash] = $hash;
                $uniqueArray[$key] = $this->superUnique($value, $preserveKeys, $hashes);
            } else {
                // skip it i guess ?? should be a duplicate
            }

        } else {

            if ($preserveKeys)
            {
                $uniqueArray[$key] = $value;
            } else {
                $uniqueArray[] = $value;
            }
        }
    }
    return $uniqueArray;
}

Here is the code AS it is run, and an example of the duplicity in the arrays

    $output = $this->superUnique($output, 1);

    foreach ($output AS $num => $arr)
    {
        // turns a multidim array to an object recursively
        $obj = $this->arrToObj($arr);

        if (isset($obj->message->body))
        {
            echo "Arr#:   {$num}\n";
            echo "Time:   {$obj->attributes->timestamp}\n";
            echo "Body:   {$obj->message->body}\n\n\n";
        }
    }

    die;

Here is a slice of my output that shows a high level of duplicity based on the pastebin array.

Arr#:   172
Time:   2013-06-25T16:34:46-0700
Body:   ok, so we decided on everything then?


Arr#:   173
Time:   2013-06-25T16:34:46-0700
Body:   ok, so we decided on everything then?


Arr#:   174
Time:   2013-06-25T16:34:46-0700
Body:   ok, so we decided on everything then?


Arr#:   175
Time:   2013-06-25T16:34:46-0700
Body:   ok, so we decided on everything then?


Arr#:   176
Time:   2013-06-25T16:34:59-0700
Body:   yes, see you tomorrow


Arr#:   177
Time:   2013-06-25T16:34:59-0700
Body:   yes, see you tomorrow


Arr#:   178
Time:   2013-06-25T16:34:59-0700
Body:   yes, see you tomorrow


Arr#:   179
Time:   2013-06-25T16:34:59-0700
Body:   yes, see you tomorrow


Arr#:   180
Time:   2013-06-25T16:35:38-0700
Body:   are you still onlne?


Arr#:   181
Time:   2013-06-25T16:36:10-0700
Body:   hey bob
ehime
  • 8,025
  • 14
  • 51
  • 110
  • why `// skip it`? Delete it – vladkras Jul 25 '13 at 03:29
  • You can do this: http://stackoverflow.com/a/946300/945775 – AgmLauncher Jul 25 '13 at 10:42
  • possible duplicate of [How to remove duplicate values from a multi-dimensional array in PHP](http://stackoverflow.com/questions/307674/how-to-remove-duplicate-values-from-a-multi-dimensional-array-in-php) – AgmLauncher Jul 25 '13 at 10:43
  • @AgmLauncher This does not work as it is not duplicate **values** but duplicate arrays. Serializing these did not seem to work, as this was one of the attempts that I had made. – ehime Jul 25 '13 at 15:44
  • @AgmLauncher Here's the output after running a serialization map, you can see heavy duplication in the array http://pastebin.com/N4hQCEcR It does reduce this from 204 members to 152 members though, but there are still quite a few more – ehime Jul 25 '13 at 15:50

1 Answers1

0

Closing, there was no duplicity, the to and from fields are not the same.

The solution I came up with was removing and re-adding message attributes, which took those fields out of the programs logic, then re attaching them further down the line by matching up removed hashes to the current keys. Cheers, hope this helps someone.

protected $patterns  = array(
    '/((?=_).*?)@.*/',          // replacing all @'s with leading underscore
    '/_+/i',                    // replacing first occurrence of underscore with @
    '/.*\//i',                  // Group chat modifier to multiple people, same from
);

protected $replace   = array(
    '$1',                       // replace with look back
    '@',                        // replace with (at)
    '',                         // replace with blank
);


..................


/**
 * Remove duplicity
 *
 * @param $array
 * @return array
 *
 * NOTE: always want keys so removed a "preserve" flag
 */
protected function superUnique($array)
{
    $uniqueArray =
    $hashes      = array();

    foreach ($array AS $key => $value)
    {
        // secondary storage of array as object
        $obj = $this->arrToObj($value);

            // remove items causing duplicity issues ....
            unset(
                $value['message']['attributes']['to'],
                $value['message']['attributes']['from']
            );

        if (TRUE === is_array($value))
        {
            // create out serializable hash
            $hash = md5(serialize($value));

            if (FALSE === array_key_exists($hash, $hashes))
            {
                // store as hashmap, remember place in array
                $hashes[$hash] = $key;

                // always preserve keys
                $uniqueArray[$key] = $value;

                // pregging inner content
                if (isset($obj->message->delay->attributes))
                {
                    foreach ($value['message']['delay']['attributes'] AS $name => $pregable)
                    {
                        $uniqueArray[$key]['message']['delay']['attributes'][$name] = $this->preg($pregable);
                    }
                }

                // initial hydration of array
                $uniqueArray[$key]['message']['attributes'][self::members] = array(
                    'to'    => array($this->preg($obj->message->attributes->to)),
                    'from'  => array($this->preg($obj->message->attributes->from)),
                );

            } else {

                // rehydrate array
                $uniqueArray[$hashes[$hash]]['message']['attributes'][self::members] = $this->fuse(
                    $uniqueArray[$hashes[$hash]]['message']['attributes'][self::members],
                    array(
                        'to'    => array($this->preg($obj->message->attributes->to)),
                        'from'  => array($this->preg($obj->message->attributes->from)),
                    )
                );
            }
        }

    }
    return $uniqueArray;
}

private function preg($value)
{
    return preg_replace($this->patterns, $this->replace, $value);
}

protected function fuse($input, $combine)
{
    $output = array();
    foreach ($input AS $key => &$value)
    {
        $output[$key] = $value;

        $flip = array_flip($input[$key]);

        if(! isset($flip[$combine[$key][0]])) $output[$key][] = $combine[$key][0];
    }
    return $output;
}
ehime
  • 8,025
  • 14
  • 51
  • 110
  • @JimMartens I was waiting the timeout period to accept my own answer and forgot about it already =/ – ehime Jul 29 '13 at 18:33