0

It's a similar question to this one, but sligthly different and I found no answers so far. Take a look at this:

Array
(
    [0] => Array
    (
        ['id'] => abc
        ['value'] => XXX
    )

    [1] => Array
    (
        ['id'] => abc
        ['value'] => ooo
    )

    [2] => Array
    (
        ['id'] => abc
        ['value'] => qqq
    )

    [3] => Array
    (
        ['id'] => ghi
        ['value'] => YYY
    )

    [4] => Array
    (
        ['id'] => ghi
        ['value'] => jkl
    )

    [5] => Array
    (
        ['id'] => ghi
        ['value'] => XXX
    )

    [6] => Array
    (
        ['id'] => mno
        ['value'] => pql
    )
)

I want to identify all duplicate values in 2-d arrays, and remove them with a custom condition, as for example in uasort.

For example take the first 3 elements [0], [1], [2]:

I want that XXX wins over the others, so the [1] and [2] will be removed,

Same for [3], [4], and [5]: they have the same id, but YYY wins over XXX and other values.

Since these are data retrieved from a DBMS, one alternative I have is to make N different queries passing the next query the id to exclude, e.g.: "

  1. Find all elements with YYY
  2. Find all elements with XXX but that haven't been already found while looking for YYY
  3. Find all elements without YYY and XXX that haven't been already found while looking for YYY or XXX.

Any help would be appreciated, hope it's all understandable.

Linuxatico

Community
  • 1
  • 1
linuxatico
  • 1,878
  • 30
  • 43

1 Answers1

1

It is hard to give an efficient solution without knowing how complex your custom conditions are. If you know and can define them in advance, then you could do something as simple as:

// Your custom conditions
$master = array('abc' => array('XXX'), 'ghi' => array('YYY'));

// $dups = the array in your post

// Group by values
$values = array();
foreach ( $dups as $k => $v ) {
    if ( is_array($v) && isset($v['id']) ) {
        $values[$v['id']][$k] = $v['value'];
    }
}

// If master values exists, use it, otherwise use value given
$deduped = array();
foreach ( $values as $k => $v ) {
    if ( isset($master[$k]) ) {
        $deduped[key($v)] = array('id' => $k, 'value' => array_shift(array_intersect($master[$k], $v)));
    } else {
        $deduped[key($v)] = array('id' => $k, 'value' => array_shift($v));
    }
}

Which gives you...

Array
(
    [0] => Array
        (
            [id] => abc
            [value] => XXX
        )

    [3] => Array
        (
            [id] => ghi
            [value] => YYY
        )

    [6] => Array
        (
            [id] => mno
            [value] => pql
        )

)

Looping in PHP can be much quicker than MySQL in many circumstances, so you would need to test using actual data to see if looping is quicker than separate queries.

Dave
  • 3,658
  • 1
  • 16
  • 9
  • Your solution is indeed correct, I am studying your way of mastering these built-in php functions, it's very interesting. Thank you. Maybe I'll update the question with benchmark php vs sql ordering – linuxatico Oct 10 '13 at 07:54