0

I have a multiple sequence alignment (Clustal) file and I want to read this file and arrange sequences in such a way that it looks more clear and precise in order.

I am doing this from Biopython using an AlignIO object:

alignment = AlignIO.read("opuntia.aln", "clustal")

print "Number of rows: %i" % len(align)

for record in alignment:
    print "%s - %s" % (record.id, record.seq)

My output looks messy and long scrolling. What I want to do is print only 50 sequences in each line and continue until the end of the alignment file.

I wish to have output like this, from http://www.ebi.ac.uk/Tools/clustalw2/.

David Cain
  • 16,484
  • 14
  • 65
  • 75
MysticCodes
  • 3,092
  • 5
  • 25
  • 33

2 Answers2

0

Do you require something more complex than simply breaking record.seq into chunks of 50 characters, or am I missing something?

You can use Python sequence slicing to achieve that very easily. seq[N:N+50] accesses the 50 sequence elements starting with N:

In [24]: seq = ''.join(str(random.randint(1, 4)) for i in range(200))

In [25]: seq
Out[25]: '13313211211434211213343311221443122234343421132111223234141322124442112343143112411321431412322123214232414331224144142222323421121312441313314342434231131212124312344112144434314122312143242221323123'

In [26]: for n in range(0, len(seq), 50):
   ....:     print seq[n:n+50]
   ....:     
   ....:     
13313211211434211213343311221443122234343421132111
22323414132212444211234314311241132143141232212321
42324143312241441422223234211213124413133143424342
31131212124312344112144434314122312143242221323123
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • I need more complex, suppose record.id[1] should show 50 characters (sequences) in first line and record.id[2] should show 50 sequences in second line and continue in the similar style. Moreover, number of sequence can be any number above then 50 – MysticCodes May 22 '10 at 13:40
  • @user343934: sorry, I still don't understand what you want – Eli Bendersky May 22 '10 at 13:44
  • when i print seq.record then it shows whole sequences but i want to split this long sequence into 50 characters in each line. I want to have output like this- http://i45.tinypic.com/4vh5rc.jpg instead of mine- http://i48.tinypic.com/ae48ew.jpg , simply – MysticCodes May 22 '10 at 13:46
0

Br,

I don't have biopython on this computer, so this isn't tested, but it should work:

chunk_size = 50

for i in range(0, alignment.get_alignment_length(), chunk_size):
    print ""
    for record in alignment:
        print "%s\t%s %i" % (record.name,  record.seq[i:i + chunk_size], i + chunk_size)

Does the same trick as Eli's one - using range to set up an index to slice from then iterating over the record in the alignment for each slice.

david w
  • 511
  • 3
  • 12
  • Oh, there's a typo in there. The second part of that tuple for the string formatting should be "record.seq[i: i+chunk_size ]" – david w May 23 '10 at 23:13