Remove wholly duplicate rows from a 2d array and count number of occurrences of each unique row

Question

I need filter out duplicate rows in my 2d array and in the retained unique rows append an element that contains the count of how many times the unique row existed in the original array.

I wanted to use array_unique($array, SORT_REGULAR), but removing duplicates is not enough -- I actually need to get store the count of the duplicated rows per with the unique rows.

I have tried array_search() and loops, but none of my attempts yield the correct results. My project data has upwards of 500,000 entries, but here's a basic example:

Input:

[
    ['manufacturer' => 'KInd', 'brand' => 'ABC', 'used' => 'true'],
    ['manufacturer' => 'KInd', 'brand' => 'ABC', 'used' => 'true'],
    ['manufacturer' => 'KInd', 'brand' => 'ABC', 'used' => 'false'],
]

Output:

[
    ['manufacturer' => 'KInd', 'brand' => 'ABC', 'used' => 'true', 'count' => 2],
    ['manufacturer' => 'KInd', 'brand' => 'ABC', 'used' => 'false', 'count' => 1],
]

You must have a way to identify an object. So you can loop the original array, checking each element (object), and counting how many copies are found of each unique object. — José Carlos PHP, Nov 05 '22 at 10:49
@JoséCarlosPHP tried that, but since the size of the array, looping is very costly. Could you be more specific about the counting, is it going to be a loop inside a loop? — Fasna, Nov 05 '22 at 10:57
array_count_values($yourArray) will return an array with your object and number of occurrences — Khaled Hassan, Nov 05 '22 at 11:03
@KhaledHassan array_count_values(): Can only count STRING and INTEGER values! It's an object array, so can't use the array_count_values(), already tried that. Thanks — Fasna, Nov 05 '22 at 11:05
You comment above: "but since the size of the array, looping is very costly". Sorry, but what do you think what functions like `array_unique()` or `array_search()` do if not looping/iterating over the array? There is no way around that. — arkascha, Nov 05 '22 at 11:06
@arkascha yeah, you're correct. array_search also takes a good amount of time. That's why searching for an alternate solution, if there's no way have to do the loop — Fasna, Nov 05 '22 at 11:11
If it's from a database you might do a `GROUP BY` and `COUNT()`: `SELECT a,b,c,count(a) FROM table GROUB BY a,b,c` — Michel, Nov 05 '22 at 12:19

score 1 · Answer 1 · answered Nov 05 '22 at 12:06

If I understand you correctly, this should help

function getUniqWithCounts(array $data): array
{
    $result = [];
    foreach ($data as $item) {
        $hash = md5(serialize($item));

        if (isset($result[$hash])) {
            $result[$hash]['count']++;
            continue;
        }
        $item['count'] = 1;
        $result[$hash] = $item;
    }

    return array_values($result);
}

mickmackusa · Accepted Answer · 2022-11-07T01:17:40.933

You don't need to use any elaborate serialization or encoding to create composite keys for grouping. Just implode each row's values (assuming they all contain the same columns in the same order) to create an identifying key for the result array.

On the first encounter, store the row's data in the group and set the group's count to 1; on any subsequent encounter, increment the group's counter.

Code: (Demo)

$result = [];
foreach ($array as $row) {
    $compositeKey = implode('_', $row);
    if (!isset($result[$compositeKey])) {
        $result[$compositeKey] = $row + ['count' => 1];
    } else {
        ++$result[$compositeKey]['count'];
    }
}
var_export(array_values($result));

Output:

array (
  0 => 
  array (
    'manufacturer' => 'KInd',
    'brand' => 'ABC',
    'used' => 'true',
    'count' => 2,
  ),
  1 => 
  array (
    'manufacturer' => 'KInd',
    'brand' => 'ABC',
    'used' => 'false',
    'count' => 1,
  ),
)

Other posts that leverage multiple identifying column values for grouping:

Remove wholly duplicate rows from a 2d array and count number of occurrences of each unique row

2 Answers2