Delete Perl Hash Values if duplication found

Question

I need a quick help , I'm greping some commands in a Unix Server and creating a hash out of those commands.

Issue is most of the time , there are duplicate Values which are coming and I want to remove any duplicate value of out these Hash.

Here is an example :

[randy@server04 ~/scripts]$ perl snmpperl.pl 
$VAR1 = {
    '1b' => [
        'abc_pl',
        'abc_pl',
        'abc_pl',
        'xyz_pl',
        'xyz_pl',
    ],
    '1a' => [
        'abc_pl',
        'abc_pl',
        'abc_pl',
        'abc_pl',
        'xyz_pl',
        'xyz_pl',
    ]

I need the Hash to be :-

$VAR1 = {
    '1b' => [
        'abc_pl',
        'xyz_pl',
    ],
    '1a' => [
        'abc_pl',
        'xyz_pl',
    ]

possible duplicate of [how-do-i-remove-duplicate-items-from-an-array-in-perl](http://stackoverflow.com/questions/7651/how-do-i-remove-duplicate-items-from-an-array-in-perl) — Joe, Jun 14 '14 at 06:03
if the order of the values is not relevant you can replace the array with a hash (multidimensional hash) — Pierre, Jun 14 '14 at 09:18

DavidO · Accepted Answer · 2014-06-14T06:46:22.717

This is a relatively common Perl idiom, and is actually addressed in the FAQ, which you could locate by typing perldoc -q duplicate on any system with Perl installed.

Here is an adaptation on the ideas expressed in the FAQ:

use strict;
use warnings;
use Data::Dumper;

my %hash = (
  '1b' => [ 'abc_pl', 'abc_pl', 'abc_pl', 'xyz_pl', 'xyz_pl', ],
  '1a' => [ 'abc_pl', 'abc_pl', 'abc_pl', 'abc_pl', 'xyz_pl', 'xyz_pl', ],
);

foreach my $v ( values %hash ) {
  my %seen;
  @$v = grep { !$seen{$_}++ } @$v;
}

print Dumper \%hash;

This works by keeping track of whether any given element in the sub-array for a given hash key has been seen before. If not, pass it through the grep filter. Otherwise, don't send it through. In the end, all that gets built into the new structure are single instances of the array elements.

One nuance worth mentioning; The "it" variable in a foreach loop becomes an alias to the element it is representing. So in this case, for each iteration of the loop, $v aliases a hash element who's value contains an anonymous array reference. We simply replace contents of the anonymous array ref with the de-duped elements.

mpapec · Answer 2 · 2014-06-14T06:27:48.850

3

use List::MoreUtils 'uniq';

@$_ = uniq @$_ for values %hash;

Replacement for uniq from List::MoreUtils

sub uniq (@) {
    my %seen;
    grep !$seen{$_}++, @_;
}

edited Jun 14 '14 at 06:27

answered Jun 14 '14 at 06:21

mpapec

50,217
8
67
127

List::MoreUtils has the advantage of being 2 to 3 times faster for lists with many duplicates (and about the same for lists without). https://gist.github.com/schwern/6592165547b4ea0f65ef – Schwern Jun 14 '14 at 19:04
@Schwern tnx for reference. Btw, can you share your view on what utility modules you find useful (like `Path::Tiny`?) – mpapec Jun 14 '14 at 21:14
I can do one better. I can put them into a cohesive module called [perl5i](https://metacpan.org/pod/perl5i). In addition to perl5i: Path::Tiny, Method::Signatures and Mouse. – Schwern Jun 15 '14 at 00:41
@Schwern I didn't expect you'll write one right away. `:)` Tnx for module which could go in the core. – mpapec Jun 15 '14 at 21:19

Delete Perl Hash Values if duplication found

2 Answers2