PHP De-duplicate a multidimensional array

Question

Firstly, I realise this may appear as a duplicate as I have read a number of questions on a similar topic (1, 2) but I'm struggling to see how to re-architect the code base to fit my senario.

I am attempting to take an existing multi-dimensional array and remove any nodes that have a duplicate in a specific field. Here is dataset I am working with:

array(3) {
  [0]=>
  array(3) {
    ["company"]=>
    string(9) "Company A"
    ["region"]=>
    string(4) "EMEA"
    ["ctype"]=>
    string(8) "Customer"
  }
  [1]=>
  array(3) {
    ["company"]=>
    string(9) "Company A"
    ["region"]=>
    string(4) "EMEA"
    ["ctype"]=>
    string(8) "Customer"
  }
  [2]=>
  array(3) {
    ["company"]=>
    string(9) "Company C"
    ["region"]=>
    string(4) "EMEA"
    ["ctype"]=>
    string(8) "Customer"
  }
}

If this wasn't a multi-dimensional array would use in_array() to see if the dataset['company'] existed. If not I'd add it to my $unique array, something like this:

$unique = array();

foreach ($dataset as $company) {
  $company_name = $company['company'];

  if ( !in_array($company_name, $unique) ) {
    array_push($unique, $company_name);
  }
}
var_dump($unique);

But I'm unsure how to traverse the muti-dimensional array to get to the ['company'] data to see if it exists (as it is the only item I need to check to see if it already exists).

I am looking to output exactly the same data as the initial dataset, just with the duplicate removed. Please can you point me in the right direction?

score 1 · Answer 1 · answered Jan 13 '15 at 12:10

1

Store already checked companies in some side-array:

$unique = array();
$companies = array();

foreach ($dataset as $company) {
    $company_name = $company['company'];

    if ( !in_array($company_name, $companies) ) {
        array_push($unique, $company);
        array_push($companies, $company_name);
    }
}

var_dump($unique);

answered Jan 13 '15 at 12:10

u_mulder

54,101
5
48
64

This seems to work! Will come back and confirm after further testing. Thanks! – Sheixt Jan 13 '15 at 13:14

score 1 · Answer 2 · answered Jan 13 '15 at 12:11

Use array_filter with the use keyword and a pass by reference array.

>>> $data
=> [
       [
           "company" => "Company A",
           "region"  => "EMEA",
           "ctype"   => "Customer"
       ],
       [
           "company" => "Company A",
           "region"  => "EMEA",
           "ctype"   => "Customer"
       ],
       [
           "company" => "Company C",
           "region"  => "EMEA",
           "ctype"   => "Customer"
       ]
   ]
$whitelist = [];

array_filter($data, function ($item) use (&$whitelist) { 
  if (!in_array($item['company'], $whitelist)) { 
    $whitelist[] = $item['company']; 
    return true; 
  }; 
  return false; 
});

=> [
       0 => [
           "company" => "Company A",
           "region"  => "EMEA",
           "ctype"   => "Customer"
       ],
       2 => [
           "company" => "Company C",
           "region"  => "EMEA",
           "ctype"   => "Customer"
       ]
   ]

Whilst I follow the concept of this, I'm not getting the right output. The returned data is: array(2) { [0]=> string(9) "Company A" [1]=> string(9) "Company C" } — Sheixt, Jan 13 '15 at 12:57
Due to the PHP version running on the server I had to amend `$whitelist = [];` to `$whitelist = array();` I assume this has no impact? — Sheixt, Jan 13 '15 at 13:01
not at all, is just the same object but with different styles of declaration. — markcial, Jan 14 '15 at 10:26

Spoke44 · Answer 3 · 2015-01-13T13:26:39.000

To rebuild an array without duplicates :

$result = array();
foreach($datas as $data){
  foreach($data as $key => $value){
    $result[$key][$value] = $value;
  }
}

print_r($result);

OUTPUT :

Array
(
    [company] => Array
        (
            [Company A] => Company A
            [Company C] => Company C
        )

    [region] => Array
        (
            [EMEA] => EMEA
        )

    [ctype] => Array
        (
            [Customer] => Customer
        )

)

Keeping the same architecture :

$datas = array(
  array(
    "company"=>"Company A",
    "region"=>"EMEA",
    "ctype"=>"Customer"
  ),
  array(
    "company"=>"Company A",
    "region"=>"EMEA",
    "ctype"=>"Customer"
  ),
  array(
    "company"=>"Company C",
    "region"=>"EMEA",
    "ctype"=>"Customer"
  )
);

function removeDuplicateOnField($datas, $field){
  $result = array();

  foreach($datas as $key => &$data){
      if(isset($data[$field]) AND !isset($result[$data[$field]])){
        $result[$data[$field]] = $data;
      }
      else 
        unset($datas[$key]);
  }
  return $datas;
}

$result = removeDuplicateOnField($datas, "company");

print_r($result);

Ah, I'm looking to keep the same data structure for the output as was inserted (i.e. as the same as in the first code snippet in the question). This is because I will need to manipulate the data with the associations that exist. — Sheixt, Jan 13 '15 at 13:12

score 0 · Answer 4 · answered Jan 13 '15 at 12:13

0

What you seem to be describing is something PHP can already cater for. Have you heard of the array_unique function before? It doesn't work recursively, but while browsing through the PHP docs someone has already created a function which will work.

recursive array unique for multiarrays

function super_unique($array)
{
  $result = array_map("unserialize", array_unique(array_map("serialize", $array)));

  foreach ($result as $key => $value)
  {
    if ( is_array($value) )
    {
      $result[$key] = super_unique($value);
    }
  }

  return $result;
}

Let me know if this works, as I am currently out the office at the moment.

answered Jan 13 '15 at 12:13

GNewton

90
14

This isn't far away. It seems to de-dupe, however there is still an erroneous (empty) array at the end of the data: ` array(3) { ["company"]=> string(9) "Company A" ["region"]=> string(4) "EMEA" ["ctype"]=> string(8) "Customer" } array(3) { ["company"]=> string(9) "Company C" ["region"]=> string(4) "EMEA" ["ctype"]=> string(8) "Customer" } array(2) { [0]=> NULL [2]=> NULL }` – Sheixt Jan 13 '15 at 13:09
It was a 'shot in the dark' sort of thing while i was browsing through on my laptop, as I wasn't at my normal development machine. I see in the comments above, you have found your answer. Happy coding! – GNewton Jan 13 '15 at 19:21

PHP De-duplicate a multidimensional array

4 Answers4