1

I'm trying to access individual strings in the alignment object which is produced by the pairwise aligner in biopython but not getting anywhere. I'm talking about the already aligned sequences showing gaps, as given by the print(alignment), but trying to get them individually or even slice. The documentation stipulates it's possible but I'm getting errors.

from Bio import Align
aligner = Align.PairwiseAligner(mode='global',gap_score=-5)
my_target= 'CAGGTGCAGCTGGTGCAGAGCGGCGCGGAAGTGAAAAAACCGGGCAGCAGCG'
my_query='CAGTGCAGCTGGTGCAGAGCGACGCGGAAGTGAAAAAACCGGGAGCAGCG'
aln= aligner.align(my_target,my_query)
print(aln[0])

The result is:

CAGGTGCAGCTGGTGCAGAGCGGCGCGGAAGTGAAAAAACCGGGCAGCAGCG
|||-||||||||||||||||||.|||||||||||||||||||||-|||||||
CAG-TGCAGCTGGTGCAGAGCGACGCGGAAGTGAAAAAACCGGG-AGCAGCG

Now, I'd like to get the 'query' sequence in the bottom line individually. I can access the aln[0].query but this seems to be just the naked query seq not as aligned (with gaps).

The documentation stipulates the alignment object should be iterable to slice it but this simply is not working.

What I'm getting is:

aln.alignment[1]

File c:\Anaconda3\lib\site-packages\Bio\Align\__init__.py:1024, in PairwiseAlignment.__getitem__(self, key)
   1022     raise NotImplementedError
   1023 if isinstance(key, int):
-> 1024     raise NotImplementedError
   1025 if isinstance(key, tuple):
   1026     try:

NotImplementedError:

The doc:

enter image description here

I'd appreciate some help, pointers. Cheers.

MaxSense
  • 43
  • 7
  • 1
    What version of Biopython and python are you using ? – ftorre May 29 '23 at 20:07
  • good point... I have Conda with Python 3.9.12 but running Biopython 1.78. This version got set up by install on conda just a couple of days ago. Now I can see this is not the latest. – MaxSense May 29 '23 at 20:23
  • 2
    Using python 3.11 and biopython 1.81, it seems to work but you need to use `aln._alignment[1]` – ftorre May 29 '23 at 20:28
  • updated to 1.81 but no luck on python 3.9.12. Need to check python 3.8 next - seems to be mentioned in the biopython ref. – MaxSense May 29 '23 at 21:17
  • 2
    OK, that's actually working. Got 1.81 and python 3.11 and I can see ._alignment property now (but not .alignment). Also discovered I was using VSCode interactive pointed to base while being in other env. After cleaning up I can see what I needed. Many thanks @ftorre – MaxSense May 29 '23 at 21:47
  • 1
    So this is getting weird... I can access aln._alignment but only after I use aln[0] anywhere upstream in the code. Otherwise I'm getting 'object has no attribute ._alignment "... So this is unstable... – MaxSense May 30 '23 at 21:44
  • 1
    Indeed, variables prefixed with _ generally means that they should only be used by the package, and not the end user (you can see [this post](https://stackoverflow.com/questions/1301346/what-is-the-meaning-of-single-and-double-underscore-before-an-object-name)). Looking through the [source code](https://github.com/biopython/biopython/blob/master/Bio/Align/__init__.py), I noticed that the _alignment is actually returned by \_\_getitem\_\_. You can therefore access the alignment query with: `aln[0][1]` or `aln[1][1]` as there are two alignments in your case. `aln[0][0]`returns the target. – ftorre May 30 '23 at 21:49
  • BIOPYTHON VERSION : 1.80 works fine with just print(aln[0][1]) and print(aln[0][10), both print(aln[0]) and print(aln[1]) return the entire alignment – pippo1980 Jun 01 '23 at 08:43

1 Answers1

2

Answering own question. In biopython 1.81 and python 3.11.3 it seems the alignment object is iterable and each iteration is iterable further to access the aligned strings showing the deletions/insertions. So in the code from the original question I'm doing:

aln[0][1]

to get:

CAG-TGCAGCTGGTGCAGAGCGACGCGGAAGTGAAAAAACCGGG-AGCAGCG

or slicing the string:

aln[0][1][start:stop]  

It does not work in python 3.9.12 though ('Not implemented' error when trying aln[0][N]), N being 0 or 1.

MaxSense
  • 43
  • 7