remove duplicate values for a key in hash

Question

I have the following code

chdir("c:/perl/normalized");
$docid=0;
my %hash = ();
@files = <*>;
foreach $file (@files) 
  {
    $docid++;
    open (input, $file);    
    while (<input>) 
      {
    open (output,'>>c:/perl/tokens/total');
    chomp;
    (@words) = split(" ");  
    foreach $word (@words)
    {
    push @{ $hash{$word} }, $docid;

    }
      }
   }
foreach $key (sort keys %hash) {
    print output"$key : @{ $hash{$key} }\n";
}


close (input);
close (output);

This is a sample output in a file

of : 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 4 4 4 4 5 6 6 7 7 7 7 7 7 7 7 7

it is true since the term "of" for example existed 10(ten ones) times in the first document however is there a way to remove the repeated values; i.e instead of ten ones I want just one Thank you for your help

Before adding it, check if it's already in the hash. Or am I missing something here? — Madbreaks, Nov 06 '12 at 18:31
It has been [asked before](http://stackoverflow.com/questions/7651/how-do-i-remove-duplicate-items-from-an-array-in-perl). Please do a search before posting another question of the same ilk. — hd1, Nov 06 '12 at 18:35

score 5 · Accepted Answer · answered Nov 06 '12 at 19:07

5

To avoid adding the dups in the first place, change

foreach $word (@words)

to

foreach $word (uniq @words)

If you want to leave the dups in the data structure, instead change

print output"$key : @{ $hash{$key} }\n";

to

print output "$key : ", join(" ", uniq @{ $hash{$key} }), "\n";

uniq is provided by List::MoreUtils.

use List::MoreUtils qw( uniq );

Or you can use

sub uniq { my %seen; grep !$seen{$_}++, @_ }

answered Nov 06 '12 at 19:07

ikegami

367,544
15
269
518

IS there a way to keep a counter for the removed duplicates? – user1804029 Nov 07 '12 at 08:57
Best bet might be to use a hash instead of an array, and maintain a count as the value of the hash. `++$hash{$word}{$docid};` Use `keys` to get the doc ids. You'll lose the order, but it can easily be restored using a numerical sort. – ikegami Nov 07 '12 at 09:01
No. You're storing the doc id in an array value (`$hash{$word}[$i] = $docid;`). I suggested you store it in a hash key (`$hash{$word}{$docid} = $count;`), and I showed you how to do it. – ikegami Nov 07 '12 at 09:15
the uniq removes duplicates which is good, but I want to make a counter for removed duplicates, for example the output should be 1(10) 2(7) 3(2) 4(4) ...etc – user1804029 Nov 07 '12 at 09:23
Not only did I hear you the first time, I answered you. If you have problems implementing it, please post a new question. I can't go into details in comments, and I'm not a code writing service. – ikegami Nov 07 '12 at 09:27

remove duplicate values for a key in hash

1 Answers1