1
def __init__(self,emps=str(""),l=[">"]):
    self.str=emps
    self.bl=l


def fromFile(self,seqfile):
    opf=open(seqfile,'r')                                       
    s=opf.read()                                              
    opf.close()                                                    
    lisst=s.split(">")                                             
    if s[0]==">":
        lisst.pop(0)                                                    
    nlist=[]
    for x in lisst:
        splitenter=x.split('\n')                                        
        splitenter.pop(0)                                               
        splitenter.pop()                                                
        splitstring="".join(splitenter)                                 
        nlist.append(splitstring)                                       
    nstr=">".join(nlist)                                                
    nstr=nstr.split()
    nstr="".join(nstr)
    for i in nstr:
        self.bl.append(i)
    self.str=nstr
    return nstr

def getSequence(self):
    print self.str
    print self.bl
    return self.str

def GpCratio(self):
    pgenes=[]
    nGC=[]
    for x in range(len(self.lb)):                                   
        if x==">":
            pgenes.append(x)                                           
    for i in range(len(pgenes)):                                        
        if i!=len(pgenes)-1:                                            
            c=krebscyclus[pgenes[i]:pgenes[i+1]].count('c')+0.000       
            g=krebscyclus[pgenes[i]:pgenes[i+1]].count('g')+0.000                                          
            ratio=(c+g)/(len(range(pgenes[i]+1,pgenes[i+1])))
            nGC.append(ratio)                                           
    return nGC  

s = Sequence()
s.fromFile('D:\Documents\Bioinformatics\sequenceB.txt')
print 'Sequence:\n', s.getSequence(), '\n'
print "G+C ratio:\n", s.GpCratio(), '\n'

I dont understand why it gives the error:

in GpCratio     for x in range(len(self.lb)): AttributeError: Sequence instance has no attribute 'lb'. 

When i print the list in def getSequence it prints the correct DNA sequenced list, but i can not use the list for searching for nucleotides. My university only allows me to input 1 file and not making use of other arguments in definitions, but "self" btw, it is a class, but it refuses me to post it then.. class called Sequence


Cédric Julien
  • 78,516
  • 15
  • 127
  • 132
Niels
  • 482
  • 1
  • 5
  • 18
  • about your `__init__(self,emps=str(""),` `l=[">"]` `)`: [beware of mutable default parameters in python](http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument) – bernard paulus Nov 23 '12 at 17:01

1 Answers1

4

Looks like a typo. You define self.bl in your __init__() routine, then try to access self.lb.

(Also, emps=str("") is redundant - emps="" works just as well.)

But even if you correct that typo, the loop won't work:

for x in range(len(self.bl)):   # This iterates over a list like [0, 1, 2, 3, ...]
    if x==">":                  # This condition will never be True
        pgenes.append(x) 

You probably need to do something like

pgenes=[]
for x in self.bl:
    if x==">":                  # Shouldn't this be != ?
        pgenes.append(x) 

which can also be written as a list comprehension:

pgenes = [x for x in self.bl if x==">"]

In Python, you hardly ever need len(x) or for n in range(...); you rather iterate directly over the sequence/iterable.

Since your program is incomplete and lacking sample data, I can't run it here to find all its other deficiencies. Perhaps the following can point you in the right direction. Assuming a string that contains the characters ATCG and >:

>>> gene = ">ATGAATCCGGTAATTGGCATACTGTAG>ATGATAGGAGGCTAG"
>>> pgene = ''.join(x for x in gene if x!=">")
>>> pgene
'ATGAATCCGGTAATTGGCATACTGTAGATGATAGGAGGCTAG'
>>> ratio = float(pgene.count("G") + pgene.count("C")) / (pgene.count("A") + pgene.count("T"))
>>> ratio
0.75

If, however, you don't want to look at the entire string but at separate genes (where > is the separator), use something like this:

>>> gene = ">ATGAATCCGGTAATTGGCATACTGTAG>ATGATAGGAGGCTAG"
>>> genes = [g for g in gene.split(">") if g !=""]
>>> genes
['ATGAATCCGGTAATTGGCATACTGTAG', 'ATGATAGGAGGCTAG']
>>> nGC = [float(g.count("G")+g.count("C"))/(g.count("A")+g.count("T")) for g in genes]
>>> nGC
[0.6875, 0.875]

However, if you want to calculate GC content, then of course you don't want (G+C)/(A+T) but (G+C)/(A+T+G+C) --> nGC = [float(g.count("G")+g.count("C"))/len(g)].

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • I changed too: for x in range(len(self.bl)): if self.bl[x]==">": pgenes.append(x) but still it returns an empty list – Niels Dec 12 '11 at 21:18
  • i changed to (although i'm not familiar yet with this style) pgenes = [x for x in self.bl if x==">"], yet it returns an empty list – Niels Dec 12 '11 at 21:25
  • Assuming that `self.bl == ">"` (unchanged from the init), then it's no wonder. `len(pgenes)` will be 1, so `range(1)` is `[0]`. Therefore `if i!=len(pgenes)-1` is `False`, so nothing will ever be appended to `nGC`. – Tim Pietzcker Dec 12 '11 at 21:26
  • why isn't it changed? I changed it in def fromFile to: for i in nstr: self.bl.append(i) so it prints the correct DNA list: self.bl prints ['>','A','T','G','A', etc..] but when i try calculate the ratio of (G+C)/(A+T) it doenst do anything but to return an empty list – Niels Dec 12 '11 at 21:34
  • OK, I see what you mean. It looks like you're making things a lot more complicated than they need to be, though. What exactly is the meaning of `>`? Does it occur more than once in the string? Can you show a typical example for `self.bl` and then how the corresponding `pgenes` should look like? Right now, the routine collects all the `>`s (or their indices) from `self.bl` into `pgenes`... – Tim Pietzcker Dec 12 '11 at 21:38
  • At > a new gene starts, that's why I called it p(ositions of)genes. It occurs about 68000 times, but i'm using it first on only 1 gene, to see whether it works, but as you stated, it isn't making any sense range(len()) etc on 1 gene. In the second method i erased the commentary in the file. At the end it will look like ['>', 'G', 'C', 'G', 'A', 'A', 'G', 'A', 'G', 'G', 'C', 'C', 'A', 'T', 'C', 'A','>', 'A', 'T', 'G', 'C', 'G', 'C','>','C', 'T', 'C', 'C'] the pgenes for using the file with only 1 gene returns [0,865] but you fixed my major problem on self.bl so i thank you a lot for that! – Niels Dec 12 '11 at 21:55