0

I have this text (shortened version of my original text):

mytext.txt BAHJSBUBGUCYHAGSBUCAGSUCBASBCYHUBXZCZPZHCUIHAUISHCIUJXZJCBZYAUSGHDYUAGWEBWHBHJASBHJASCXZBUYTRTRTRJFUARGAFGOOPWWKBBCAAAABBXHABSDAUSBCZAAAAAAAAACGAFAXHJBJHXZCXZCCZCXZUCAGSUCBASBCYHUBXZCZPZHCUIHAUISHCIUJXZJCBZYAUSGHDYUAGWEBWHBHJASBHJASCXZBUYHABSDAUSZXHJBRRRRRRJFUABGAFGLLPKWAACAAAABBZJHXZXHJBJHXZXHJBJHXJBJHXZCXZCCZCXZUCAGSAJIJICXZIJUAUUISUSJUSSJSJSJAJCXZXCZTTTTTRJFUABGAFGLOPKWABCAAAABBU

My code is the following, which intends to print all of the matches and then save them into a file as well. But I am not getting any matches while I except there to be at least 10 in my original file.

open(text, "<mytext.txt");

push (@matches,$&) while(<text> =~ m{
    ([TR]{6}
    JFUA
    [ABR]{1}
    GAFG
    ( [LOP]{2,3} )
    [KW]{2,5}
    (??{ $2 =~ tr/LOP/ABC/r })
    AAAABB[UXZ]{1})
    /g
}x);

print "@matches\n";

my $filename = 'results_matches.txt';
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh "@matches\n";
close $fh;
print "done\n";

I have also tried the following code and this also does not work:

my @matches = <text> =~ m{
        ([TR]{6}
        JFUA
        [ABR]{1}
        GAFG
        ( [LOP]{2,3} )
        [KW]{2,5}
        (??{ $2 =~ tr/LOP/ABC/r })
        AAAABB[UXZ]{1})
        /g
    }x;

print "@matches\n";

I have the following code which successfully prints out only one (the first) result. But it fails to print all of the matches.

if (<text> =~ m{
    ([TR]{6}
    JFUA
    [ABR]{1}
    GAFG
    ( [LOP]{2,3} )
    [KW]{2,5}
    (??{ $2 =~ tr/LOP/ABC/r })
    AAAABB[UXZ]{1})
}x) {print "$1\n";}

I have followed the answers in this topic but have not been able to get any of them to work: How can I find all matches to a regular expression in Perl?

funotonah
  • 3
  • 1

1 Answers1

1

By using while <text>, you are reading a new file from the file handle on each iteration of the loop. You need to loops, one iterating over the lines, and the inner loop to iterate over the matches.

while (my $line = <text>) {
    push @matches, $1 while $line
        =~ m{
            ([TR]{6}
            JFUA
            [ABR]
            GAFG
            ( [LOP]{2,3} )
            [KW]{2,5}
            (??{ $2 =~ tr/LOP/ABC/r })
            AAAABB[UXZ])
        }xg;
}

I also removed {1} as it's useless, used $1 instead of $& because $& imposes a performance penatly on all the matching you do in a program; and removed the /g and added the g to the right place (i.e. next to }x).

When testing, I copied'n'pasted the input from here, i.e. I have all the characters in one line. If your input is different, please use the code formatting for it, not quotation.

choroba
  • 231,213
  • 25
  • 204
  • 289