0

I have a file (say bugs.txt) which is generated by running some code. This file has list of JIRAS. I want to write a code which can remove duplicate entries from this file.

The logic should be generic as the bugs.txt file will be different everytime.

sample input file bugs.txt:

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

sample output:

BUG-111, BUG-122, BUG-123, JIRA-221, JIRA-234

My trial code:

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(FH, '<', $file1) or die $!;
  {
    local $/;
    $Jira_nums = <FH>;
  }
close FH;

I need help in designing the logic for removing duplicate entries from the file bugs.txt

Yash
  • 2,944
  • 7
  • 25
  • 43
  • Possible duplicate of https://stackoverflow.com/questions/5884401/perl-find-duplicate-lines-in-file-or-array – AbhiNickz Aug 04 '17 at 10:09
  • Is it a one line file? If not, do you want to remove dups that exist on different lines? – Toto Aug 04 '17 at 11:32
  • yes @Toto, this may be a single or multiple line file. Idea is to remove duplicate entries from the entire file. – Yash Aug 04 '17 at 12:24

2 Answers2

1

You just need to add these lines to your script:

my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;

You'll get:

use strict;
use warnings;
use Data::Dumper;

my $file1="/path/to/file/bugs.txt";
my $Jira_nums;
open(my $FH, '<', $file1) or die $!; # use lexical file handler
  {
    local $/;
    $Jira_nums = <$FH>;
  }
my %seen;
my @no_dups = grep{!$seen{$_}++}split/,?\s/,$Jira_nums;
say Dumper \@no_dups;

For input data like:

BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221

it gives:

$VAR1 = [
          'BUG-111',
          'BUG-122',
          'BUG-123',
          'JIRA-221',
          'JIRA-234'
        ];
Toto
  • 89,455
  • 62
  • 89
  • 125
0

You can try this:

use strict;
use warnings;

my @bugs = "";
@bugs =  split /\,?(\s+)/, $_ while(<DATA>);
my @Sequenced = map {$_=~s/\s*//g; $_} RemoveDup(@bugs);

print "@Sequenced\n";

sub RemoveDup {     my %checked;   grep !$checked{$_}++, @_;  }


__DATA__
BUG-111, BUG-122, BUG-123, BUG-111, BUG-123, JIRA-221, JIRA-234, JIRA-221
ssr1012
  • 2,573
  • 1
  • 18
  • 30