I have the following code
chdir("c:/perl/normalized");
$docid=0;
my %hash = ();
@files = <*>;
foreach $file (@files)
{
$docid++;
open (input, $file);
while (<input>)
{
open (output,'>>c:/perl/tokens/total');
chomp;
(@words) = split(" ");
foreach $word (@words)
{
push @{ $hash{$word} }, $docid;
}
}
}
foreach $key (sort keys %hash) {
print output"$key : @{ $hash{$key} }\n";
}
close (input);
close (output);
This is a sample output in a file
of : 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 4 4 4 4 5 6 6 7 7 7 7 7 7 7 7 7
it is true since the term "of" for example existed 10(ten ones) times in the first document however is there a way to remove the repeated values; i.e instead of ten ones I want just one Thank you for your help