1

I'm looking for the possible algorithm for script which will search my long DNA sequence defined in str object for the specified motifs (Shorter DNA fragments), count each findings (assuming that my seq has several identical motifs), and print first nucleotide number in sequence where motif have been detected.

assuming that defined below each object I should use such searching in some loop because both below examples could find motifs only 1 time. What are the proper way to specify such looping ?

#Loading data
seq = open('motif.txt', 'r')
chains=[]
[chains.append(line[:-1]) for line in seq]
Seq,Motif = chains[0], chains[1]
count=0


# Search motif
Seq.find(Motif)

if y == 1:
    print "%s has been detected" %(Motif)

if Motif in Seq:
    print "%s has been detected" %(Motif)
user3470313
  • 303
  • 5
  • 16

3 Answers3

2

It there something already exist that can do it? biopython? But anyway, it is not that hard and you don't need a loop:

import re

seq='aaattatagggatatata'

motif='ata'

Q=re.compile(motif)

[item.start(0) for item in Q.finditer(seq)] #or maybe item.start(0)+1 if you want it
#Out[23]: [5, 11, 15]
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
1

the question sounds to me a little ambiguous in terminology. Since you said you are looking for "motifs" in your question I would like to ask if you are really trying to find an exact sequence or if your purpose is searching for TFBS. If you are looking for exact occurrences of a specific string, then @CT Zhu's answer is the right one for you.

However, if you are looking for TFBSs, that might not be as trivial as looking for an exact sequence, since these sequences are degenerate and not always correspond to the same sequence, although they share some patterns. In this case I would suggest to take a look at motif databases such as Jaspar or TRANSFAC and maybe the Biopython "motifs" module could be a good start point: http://biopython.org/DIST/docs/api/Bio.motifs-module.html

More sophisticated approaches for motif finding can be found in the literature: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003214#pcbi-1003214-g008

cnluzon
  • 1,054
  • 1
  • 11
  • 22
0

I add this as another answer because I am not allowed to make comments...

However, I think you can find the answer in this question: Python regex find all overlapping matches?

Community
  • 1
  • 1
cnluzon
  • 1,054
  • 1
  • 11
  • 22