perl code to remove duplicate entries from a file

Question

I have a file (say bugs.txt) which is generated by running some code. This file has list of JIRAS. I want to write a code which can remove duplicate entries from this file.

The logic should be generic as the bugs.txt file will be different everytime.

sample input file bugs.txt:

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

sample output:

BUG-111, BUG-122, BUG-123, JIRA-221, JIRA-234

My trial code:

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(FH, '<', $file1) or die $!;
  {
    local $/;
    $Jira_nums = <FH>;
  }
close FH;

I need help in designing the logic for removing duplicate entries from the file bugs.txt

Possible duplicate of https://stackoverflow.com/questions/5884401/perl-find-duplicate-lines-in-file-or-array — AbhiNickz, Aug 04 '17 at 10:09
Is it a one line file? If not, do you want to remove dups that exist on different lines? — Toto, Aug 04 '17 at 11:32
yes @Toto, this may be a single or multiple line file. Idea is to remove duplicate entries from the entire file. — Yash, Aug 04 '17 at 12:24

score 1 · Accepted Answer · answered Aug 04 '17 at 13:03

You just need to add these lines to your script:

my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;

You'll get:

use strict;
use warnings;
use Data::Dumper;

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(my $FH, '<', $file1) or die $!; # use lexical file handler
  {
    local $/;
    $Jira_nums = <$FH>;
  }
my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;
say Dumper \@no_dups;

For input data like:

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

it gives:

$VAR1 = [
          'BUG-111',
          'BUG-122',
          'BUG-123',
          'JIRA-221',
          'JIRA-234'
        ];

Thanks @Toto, proposed one liner solution worked for me. – Yash Aug 11 '17 at 04:33 — Yash, Aug 11 '17 at 04:33

ssr1012 · Answer 2 · 2017-08-04T10:28:22.053

0

You can try this:

use strict;
use warnings;

my @bugs = "";
@bugs =  split /\,?(\s+)/, $_ while(<DATA>);
my @Sequenced = map {$_=~s/\s*//g; $_} RemoveDup(@bugs);

print "@Sequenced\n";

sub RemoveDup {     my %checked;   grep !$checked{$_}++, @_;  }


__DATA__
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

edited Aug 04 '17 at 10:28

answered Aug 04 '17 at 10:12

ssr1012

2,573
1
18
30

perl code to remove duplicate entries from a file

2 Answers2