-1

I'm writing a program in which I can Reverse the sequence and Replace all As with Ts, all Cs with Gs, all Gs with Cs, and all Ts with As. the program is to read a sequence of bases and output the reverse complement sequence. I am having trouble to do it so can anyone please help me with this by having a look on my code:

word = raw_input("Enter sequence: ")
a = word.replace('A', 'T')
b = word.replace('C', 'G')
c = word.replace('G', 'C')
d = word.replace('T', 'A')
if a == word and b == word and c == word and d == word:
    print "Reverse complement sequence: ", word

And I want this sort of output:

Enter sequence: CGGTGATGCAAGG
Reverse complement sequence: CCTTGCATCACCG

Regards

behnam
  • 1,959
  • 14
  • 21
jaddy123
  • 7
  • 4

3 Answers3

5

I would probably do something like:

word = raw_input("Enter sequence:")

# build a dictionary to know what letter to switch to
swap_dict = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}

# find out what each letter in the reversed word maps to and then join them
newword = ''.join(swap_dict[letter] for letter in reversed(word))

print "Reverse complement sequence:", newword

I don't quite understand your if statement, but the above code avoids needing one by looping over each letter, deciding what it should become, and then combining the results. That way each letter only gets converted once.

Edit: oops, I didn't notice that you wanted to reverse the string too. Fixed.

DSM
  • 342,061
  • 65
  • 592
  • 494
  • In your code this error is coming up!! : Traceback (most recent call last): File "C:\Python26\python code\week 4\Q9", line 3, in newword = ''.join(swap_dict[letter] for letter in reversed(word)) File "C:\Python26\python code\week 4\Q9", line 3, in newword = ''.join(swap_dict[letter] for letter in reversed(word)) KeyError: 'S' – jaddy123 Sep 02 '12 at 04:05
  • Ah. That `KeyError` is saying that S isn't in the swap dictionary, so it doesn't know what to do with it. What do you want to happen with the letter S? – DSM Sep 02 '12 at 04:34
3

Your code as written is problematic, because steps 1 and 4 are the opposite of each other. Thus they can't be done in completely separate steps: you convert all As to Ts, then convert those (plus the original Ts) to As in step 4.

For something simple, built-in, and- hopefully- efficient, I'd consider using translation tables from the string module:

import string
sequence = "ATGCAATCG"
trans_table = string.maketrans( "ATGC" , "TACG")
new_seq = string.translate( sequence.upper() , trans_table )
print new_seq

This gives the output desired:

'TACGTTAGC'

Although I doubt that your users will ever forget to capitalize all letters, it's good practice to ensure that the input is in the form expected; hence the use of sequence.upper(). Any letters/bases with conversions not included in the translation table will be unaffected:

>>> string.translate( "AEIOUTGC" , trans_table )
'TEIOUACG'

As for the reverse complement sequence? You can do that concisely using slice notation on the output string, with a step of -1:

>>> new_seq[::-1]
'CGATTGCAT'
abought
  • 2,652
  • 1
  • 18
  • 13
  • Steps #1 and #4 actually *aren't* the opposites of each other. I thought that at first too, but the LHS of #1 is `a` and the LHS of #4 is `d`, so there's actually no double-conversion going on. – DSM Sep 02 '12 at 03:49
  • Ah, you're right- good call. The original word isn't being modified, but rather the result of each separate substitution is stored in a separate variable. I'm not sure why this design was chosen, but it occurs to me that having 4 separate variables that each contain some modification of a very long sequence might lead to memory usage issues. – abought Sep 02 '12 at 03:55
1

So if I understand what you want to do, you want to swap all Ts and As as well as swap all Gs and Cs and you want to reverse the string.

OK, well first, let's work on reversing the string, something you don't have implemented. Unfortunately, there's no obvious way to do it but this SO question about how to reverse strings in python should give you some ideas. The best solution seems to be

reversedWord = word[::-1]

Next, you need to swap the letters. You can't call replace("T", "A") and replace("A","T") on the same string because that will make both you As and Ts all be set to T. You seem to have recognized this but you use separate strings for each swap and don't ever combine them. Instead you need to go through the string, one letter at a time and check. Something like this:

swappedWord = "" #start swapped word empty
for letter in word: #for every letter in word
    if letter  == "A": #if the letter is "A"
        swappedWord += "T" #add a "T
    elif letter  == "T": #if it's "T"
        swappedWord += "A" #add an "A"
    elif letter  == "C": #if it's "C"
        ... #you get the idea

    else: #if it isn't one of the above letters
        swappedWord += letter #add the letter unchanged

(EDIT - DSM's dictionary based solution is better than my solution. Our solutions are very similar though in that we both look at each character and decide what the swapped character should be but DSM's is much more compact. However, I still feel my solution is useful for helping you understand the general idea of what DSM's solution is doing. Instead of my big if statement, DSM uses a dictionary to quickly and simply return the proper letter. DSM also collapsed it into a single line.)

The reason why your if statement isn't working is that you're basically saying "if a, b, c, d, and word are all exactly the same" since == means "are equal" and if a is equal to word and b is equal to word then a must be equal to b. This can only be true if the string has no As, Ts, Cs, or Gs (i.e. word is unchanged by the swaps), so you never print out the output.

Community
  • 1
  • 1
acattle
  • 3,073
  • 1
  • 16
  • 21
  • I writing the code like this: word = raw_input("Enter sequence: ") swappedWord = "" for letter in word: if letter == "A": swappedWord += "T" elif letter == "T": swappedWord += "A" elif letter == "C": else: swappedWord += letter print "Reverse complement sequence: ", word – jaddy123 Sep 02 '12 at 04:18
  • your code is giving me this output: Enter sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG Reverse complement sequence: CGGTGATGCAAGG – jaddy123 Sep 02 '12 at 04:29
  • and I want it to give me this output:Enter sequence: CGGTGATGCAAGG Reverse complement sequence: CCTTGCATCACCG – jaddy123 Sep 02 '12 at 04:29
  • @jaddy123 #1, it looks like you put your output somewhere in the for loop, put it outside. #2, you're printing `word`, not `swappedWord`. #3 I explicitly stated that DSM's solution for swapping the letters was better and that my solution should only be used to help you understand DSM's solution. #4, I just noticed that you didn't complete the if statement either. I didn't write the whole thing because I though you could see the pattern. You need to add the C and G swaps to the if statement. – acattle Sep 02 '12 at 04:36