2

I created this code:

gene = open("AY365046.1.txt","r")

g=0;
a=0;
c=0;
t=0;

gene.readline()

for line in gene:
    line = line.lower()
    for char in line:
        if char == "g":
            g+=1
        if char == "a":
            a+=1
        if char == "c":
            c+=1
        if char == "t":
            t+=1

print "Guanina: " + str(g)
print "Adenina: " + str(a)
print "Citosina: " + str(c)
print "Timina: " + str(t)

gc = (g+c+0.) / (a+t+c+g+0.)

print "Conteúdo GC: " +str(gc)

Now I want to make it interactive... My objective is use the input() funcion to get the "sequence number" which will display the corresponding data...

On the code above, it obtains only the data of one sequence/file (AY365046.1.txt)... Therefore, I need the code to get access to more files (for exemple, sequence1.txt and sequence2.txt)... And then, get the data of g, a, c and t on the sequence/file informed on the input() function...

For exemple:

1) The system ask for the Sequence Number

2) The user type sequence2

3) The system get data from sequence2.txt

4) The variables g, a, c and t get the data from that file

5) If the sequence doesn't exist, print an error...

As far as I understand, to do all that, I just need to declare the variables, assign the .txt files to each one of them, and make an if/else...

The problem is that I have tried all that I could find, and nothing works...

Obviously I am not asking to make the code for me, but... Can you guys at least tell me where do I need to start? My logic for what I need to do is correct? I am missing something?

Ricardo
  • 63
  • 1
  • 5
  • I see that you are not closing your file. you should use gene.close() in some part of your code. or use with open("AY365046.1.txt","r") as f: .. –  Aug 14 '15 at 08:42
  • Instead of `input`, you can enter the filename in the command line with `sys.argv`. See [this question](http://stackoverflow.com/questions/983201/python-and-sys-argv). Also, you don't need `gene.readline()`. – Mel Aug 14 '15 at 08:45
  • What behaviour do you observe and what behaviour did you expect? You seem to skip the first line in the file (due to the `gene.readline()` expression). – skyking Aug 14 '15 at 08:53
  • Exactly that, @skyking ... I get the file from the NCBI database, and the first line is useless... The real data is below that, so... – Ricardo Aug 14 '15 at 09:06
  • I'm going to try, rsm and tmoreau... Thanks.. – Ricardo Aug 14 '15 at 09:08
  • @Ricardo, you should explain what you are expecting about your program. And also don't forget to accept the answer so it be removed from the unanswered questions –  Aug 14 '15 at 09:11
  • 1
    You can use `guanina = line.count('g')` `cytosina = line.count('c')`and after that divide `len(line)/(guanina+cytosina)`. I know you're just a beginner but less is more! – vds Aug 14 '15 at 09:16
  • @vds Thanks... I will :) – Ricardo Aug 14 '15 at 09:39
  • `(guanina+cytosina)/len(line)` is the right way, my bad – vds Aug 14 '15 at 09:50

2 Answers2

1

I think you want this:

import os

id = raw_input('please enter the file numbers id:')
file='AY{0}1.txt'.format(id)

if not os.path.exists(file):
    print "Error the file doesn't exists"
else:
    g,a,c,t=0,0,0,0
    with open(file,'r') as f:
        next(f)
        for line in f:
            for char in line.lower():
                if char == 'g':
                    g+=1
                if char == 'a':
                    a+=1
                if char == 'c':
                    c+=1
                if char == 't':
                    t+=1

    print "Guanina: {0}".format(g)
    print "Adenina: {0}".format(a)
    print "Citosina: {0}".format(c)
    print "Timina: {0}".format(t)

    gc = (g+c+0.) / (a+t+c+g+0.)

    print "Conteúdo GC: {0}".format(gc)

But I think that you should explain the expected behavior of your code because it is not really clear.

  • 1
    He's trying to calculate the %GC in a DNA-string. So basically calculating tha % of total letters G or C make up of a string consisting of A,C,G & T – vds Aug 14 '15 at 09:12
  • It works, thanks! But how can I skip the first line of the file now? I was using gene.readline() for that... – Ricardo Aug 14 '15 at 09:13
  • 1
    @Ricardo I've added the line skip, it is by using `next(f)`. Also keep in mind that everything you asked has been already responded! Search engines are your best friend :), good luck! –  Aug 14 '15 at 09:17
  • @vds is right... I need to skip the first line of the file because of that... The file comes from NCBI database, and the first line is a brief definition of the gene... And I don't want to get the G's, A's, C's and T's on this, only on the rest of the file... – Ricardo Aug 14 '15 at 09:20
  • Thanks again, @rsm! :) – Ricardo Aug 14 '15 at 09:20
  • @rsm But now, with the `next(f)`, I am getting this error: `Traceback (most recent call last): File "gc.py", line 15, in for line in f.readlines(): ValueError: Mixing iteration and read methods would lose data` – Ricardo Aug 14 '15 at 09:27
  • 1
    @Ricardo Sorry I made the correction.( I didn't added the vds answer about using `count`, but you should! ) –  Aug 14 '15 at 09:36
-1

The problem I see in your code is that you only read one line from the text file. The code below will return a list of the entire document so you could iterate upon that the way the rest of your code does.

with open("AY365046.1.txt","r") as f:
   lines = f.readlines()

You can read more about the file object in the manual

Jonathan
  • 8,453
  • 9
  • 51
  • 74
  • 1
    No, the iteration over the file (`gene`) actually reads the rest of the lines. Perhaps the first `readline` is just to skip the first line? – skyking Aug 14 '15 at 08:52
  • I'm not really sure about his "readline()" but he's mostly asking about using `raw_input` to get the file names –  Aug 14 '15 at 08:55