5

Let's say I have string of length in multiple of 3.

my $seq = "CTTCGAATT"; # in this case length of 9

Is there a way I can split it into equal length of 3? Such that in the end I have this array:

$VAR = ["CTT", "CGA", "ATT"];
neversaint
  • 60,904
  • 137
  • 310
  • 477
  • 1
    Crossposted to Perlmonks. There, three solutions were provided, with benchmarks demonstrating the unpack method to be a good choice. http://www.perlmonks.org/?node_id=939987 – DavidO Nov 25 '11 at 09:12

4 Answers4

16

Take a look at the solution at How can I split a string into chunks of two characters each in Perl?

Especially the unpack might be interesting:

my @codons = unpack("(A3)*", $seq);
Community
  • 1
  • 1
spuelrich
  • 194
  • 7
3

my $str='ABCDEFGHIJKLM';

we can use string match to get parts from the string, where minimum length is 1 and maximum is the required length, 3 or 4 or whatever

@parts = $str =~ /(.{1,4})/g; and we get @parts = ['ABCD', 'EFGH', 'IJKL', 'M']

Atul Gupta
  • 725
  • 1
  • 6
  • 20
3

Iterate over multiples of three, using substr to get pieces to push into a list.

Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • Thanks. In practice I have around 10 million of such string to break. Substr may be too slow? – neversaint Nov 25 '11 at 07:06
  • Try it out. If it is slow, then read through the file character by character until you fill a buffer, which you push into a list. Repeat until EOF. – Alex Reynolds Nov 25 '11 at 07:43
3
my $str = join '', map { ('A','T','C','G')[ rand 4 ] } 0 .. 900 ; # Random string

my @codons = $str =~ /[ACTG]{3}/g;   # Process in chunks of three
                                     # '/g' flag necessary

print 'Size of @codons array : ',
        scalar @codons;              # '300'
Zaid
  • 36,680
  • 16
  • 86
  • 155