0

The question provides a big string and a substring. And what I have to do is to write a code that can look for the substring from the big string, and output the .start() positions of the substrings found. For example: Sample Dataset GATATATGCATATACTT ATAT Sample Output 2 4 10

So I have written a code (shown below), however, I noticed that the code would skip position 4 in the sample data set because half of position 4 is in 2?

Please show me how I can solve this problem. Thanks sooooo much in advance!!!

import re
filename = open(input())
txt=filename.readline()
rlist=[]
text= "ATAT"
for m in re.finditer (text, txt):
    d = m.start()
    d += 1
    rlist.append(d)
print (rlist)
bart cubrich
  • 1,184
  • 1
  • 14
  • 41
Danny Xu
  • 81
  • 4
  • 2
    Possible duplicate of [Overlapping count of substring in a string in Python](https://stackoverflow.com/questions/32283255/overlapping-count-of-substring-in-a-string-in-python) – Austin Apr 08 '19 at 16:50
  • Actually I don't think this is a duplicate. The question could benefit from an edit. "How to find the start positions of a substring in a text string?," maybe. – bart cubrich Apr 08 '19 at 17:09
  • Also, welcome to SO Danny. It is okay to ask homework questions, but consider phrasing the question more generally when you ask it, and then admitting it is homework in the body. Also, I provided a full answer here, but it is not uncommon for people to answer with pseudocode if you say its homework. – bart cubrich Apr 08 '19 at 17:24

2 Answers2

0
length=len(text)
rlist=[]
for i in range(len(txt)):
    if length+i < len(txt):
        if txt[i:i+length] == text:
            rlist.append(i+1)
print (rlist)

try this! here 1st if condition is used to check array index out of bound exception

mayur nandu
  • 171
  • 1
  • 4
0

This will work without using re. Note that I commented out your file import portion so that you can test just the snippet where the text is found.

#filename = open(input())
#txt=filename.readline()
txt='GATATATGCATATACTTATAT'
rlist=[]
text= "ATAT"
index=0
rlist=[]
while index < len(txt):   #loop through the text advancing one character at a time
        index = txt.find(text, index)
        if index == -1:
            break
        rlist.append(index+1)
        index += 1 # increment the index

print(rlist)
Out: [2, 4, 10, 18]

I based this answer on this. What is cool about this solution is that the loop need only run n+1 times, where n is the number of substrings in the text.

bart cubrich
  • 1,184
  • 1
  • 14
  • 41