1

I'm a hobbyist programmer (my actual my actual major is biology), so I apologize if the code is atrocious.

Anyway, I'm doing a rosalind.info exercise (http://rosalind.info/problems/subs/) that wants to me find every index where a specific DNA motif is contained within a larger DNA sequence. Basically, I need to find the indexes of a substring in a string. Should be easy, right? Well, maybe you can help me.

So here's my code:

with open('rosalind_subs.txt') as f:
    seq = f.readline()
    seq.strip()
    subs = f.readline()
    subs.strip()
    break

def finder(x, y):
    index = x.find(y)
    return index

print("sequence is: " + seq)
print("subs is: " + subs)

print(finder(seq, subs))

And here's my output:

sequence is:  ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT

motif is: CTCTTTTCT

-1

***Repl Closed***

I left the ***Repl Closed*** in there in an effort to leave no stone unturned. Maybe it has something to do with Sublime REPL?

Anyway, you probably can't tell just by looking, but the motif is actually found MANY times in the DNA sequence, it's just the find function isn't picking up on it. What gives?

GT.
  • 764
  • 1
  • 8
  • 30
  • 2
    Maybe you could print seq and subs after the while loop. Actually, I don't understand the while loop since it will break at first iteration. – Jérôme Nov 26 '15 at 13:33
  • This code won't run because the indentation doesn't make sense. Indentation matters in python, even when you are posting it on SO. – khelwood Nov 26 '15 at 13:35
  • 2
    `seq.strip()` doesn't change `seq`. It returns a string which you are discarding. – interjay Nov 26 '15 at 13:36
  • Do you know how to use a debugger. Which IDE are you using? Maybe you should use pycharm and get familiar with the debugger. – JDurstberger Nov 26 '15 at 13:38
  • @Jérôme Yeah the while loop makes no sense. I forget what I was thinking with that. I edied it. Thanks. Check the OP. – GT. Nov 26 '15 at 13:41
  • @Altoyr I'm using Sublime Text. Also I have no idea how to use a debugger :( Where can I read up on pycharm? – GT. Nov 26 '15 at 13:43
  • @interjay Oh really? That might be why it's not working then. Thanks. I will try this. – GT. Nov 26 '15 at 13:44
  • @interjay Yup that was the problem. Thanks so much everyone! Sorry I'm a noob. I really appreciate all your help though. – GT. Nov 26 '15 at 13:45
  • @khelwood the indentations made sense in my original code, but the process of copying and pasting messed it up. Sorry. I'll check that next time. – GT. Nov 26 '15 at 13:46
  • pycharm comunity edition is free to use. A debugger lets you go through your programm step by step to find bugs. just google it ;) – JDurstberger Nov 26 '15 at 13:50
  • @Altoyr Damn bro it's like $100/yr :( – GT. Nov 26 '15 at 13:56
  • No Bro its not Bro, The community edition is free bro. Check out that link bro https://www.jetbrains.com/pycharm/download/ – JDurstberger Nov 26 '15 at 13:58
  • Oh cool thanks! Sorry, I guess you don't like people calling you bro :( – GT. Nov 26 '15 at 14:00

2 Answers2

1

break is not applicable in with scope. Please remove and try it. I have tested below code.

with open('rosalind_subs.txt') as f:
    seq = f.readline()
    seq.strip()
    subs = f.readline()
    subs.strip()

def finder(x, y):
    index = x.find(y)
    return index

print("sequence is: " + seq)
print("subs is: " + subs)

print(finder(seq, subs))

The output is

>>> 
sequence is: ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT

subs is: CTCTTTTCT
15
Elixir Techne
  • 1,848
  • 15
  • 20
0

Also a fellow biologist here who has done several rosalind.info exercises.

First of, your code to read in the sequence and the motif could be improved by using splitlines(), which takes care of removing the newline. also notice how I use tuple unpacking to assign both the seqand motif variable at once.

with open('rosalind_subs.txt') as f:
    seq, motif = f.read().splitlines()

Next, you correctly noticed that find only returns the index of the first occurrence of your motif. To find all the occurrences, it helps to know that find takes another optional argument start. If you provide that, it starts to look from that index position. Use this in a loop and you get all your indexes.

Another approach is to use regular expressions. Beware that motifs can overlap each other, so you need to make use of a lookahead assertion.

Community
  • 1
  • 1
BioGeek
  • 21,897
  • 23
  • 83
  • 145