I'm trying to deobfuscate the following Perl code (source):
#!/usr/bin/perl
(my$d=q[AA GTCAGTTCCT
CGCTATGTA ACACACACCA
TTTGTGAGT ATGTAACATA
CTCGCTGGC TATGTCAGAC
AGATTGATC GATCGATAGA
ATGATAGATC GAACGAGTGA
TAGATAGAGT GATAGATAGA
GAGAGA GATAGAACGA
TC GATAGAGAGA
TAGATAGACA G
ATCGAGAGAC AGATA
GAACGACAGA TAGATAGAT
TGAGTGATAG ACTGAGAGAT
AGATAGATTG ATAGATAGAT
AGATAGATAG ACTGATAGAT
AGAGTGATAG ATAGAATGAG
AGATAGACAG ACAGACAGAT
AGATAGACAG AGAGACAGAT
TGATAGATAG ATAGATAGAT
TGATAGATAG AATGATAGAT
AGATTGAGTG ACAGATCGAT
AGAACCTTTCT CAGTAACAGT
CTTTCTCGC TGGCTTGCTT
TCTAA CAACCTTACT
G ACTGCCTTTC
TGAGATAGAT CGA
TAGATAGATA GACAGAC
AGATAGATAG ATAGAATGAC
AGACAGAGAG ACAGAATGAT
CGAGAGACAG ATAGATAGAT
AGAATGATAG ACAGATAGAC
AGATAGATAG ACAGACAGAT
AGACAGACTG ATAGATAGAT
AGATAGATAG AATGACAGAT
CGATTGAATG ACAGATAGAT
CGACAGATAG ATAGACAGAT
AGAGTGATAG ATTGATCGAC
TGATTGATAG ACTGATTGAT
AGACAGATAG AGTGACAGAT
CGACAGA TAGATAGATA
GATA GATAGATAG
ATAGACAGA G
AGATAGATAG ACA
GTCGCAAGTTC GCTCACA
])=~s/\s+//g;%a=map{chr $_=>$i++}65,84,67,
71;$p=join$;,keys%a;while($d=~/([$p]{4})/g
){next if$j++%96>=16;$c=0;for$d(0..3){$c+=
$a{substr($1,$d,1)}*(4**$d)}$perl.=chr $c}
eval $perl;
When run, it prints out Just another genome hacker.
After running the code trough Deparse
and perltidy
(perl -MO=Deparse jagh.pl | perltidy
) the code looks like this:
( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;
(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );
$p = join( $;, keys %a );
while ( $d =~ /([$p]{4})/g ) {
next if $j++ % 96 >= 16;
$c = 0;
foreach $d ( 0 .. 3 ) {
$c += $a{ substr $1, $d, 1 } * 4**$d;
}
$perl .= chr $c;
}
Here's what I've been able to decipher on my own.
( my $d =
"AA...GCTCACA\n" # snipped double helix part
) =~ s/\s+//g;
removes all whitespace in $d
(the double helix).
(%a) = map( { chr $_, $i++; } 65, 84, 67, 71 );
makes a hash with as keys A
, T
, C
and G
and as values 0
, 1
, 2
and 3
.
I normally code in Python, so this translates to a dictionary {'A': 0, 'B': 1, 'C': 2, 'D': 3}
in Python.
$p = join( $;, keys %a );
joins the keys of the hash with $;
the subscript separator for multidimensional array emulation. The documentation says that the default is "\034", the same as SUBSEP in awk, but when I do:
my @ascii = unpack("C*", $p);
print @ascii[1];
I get the value 28
? Also, it is not clear to me how this emulates a multidimensional array. Is $p
now something like [['A'], ['T'], ['C'], ['G']]
in Python?
while ( $d =~ /([$p]{4})/g ) {
As long as $d
matches ([$p]{4})
, execute the code in the while block. but since I don't completely understand what structure $p
is, i also have a hard time understanding what happens here.
next if $j++ % 96 >= 16;
Continue if the $j
modulo 96 is greater or equal to 16. $j
increments with each pass of the while loop (?).
$c = 0;
foreach $d ( 0 .. 3 ) {
$c += $a{ substr $1, $d, 1 } * 4**$d;
}
For $d
in the range from 0
to 3
extract some substring, but at this point I'm completely lost. The last few lines concatenate everything and evaluates the result.