1

So I'm working on a problem where I have to find various string repeats after encountering an initial string, say we take ACTGAC so the data file has sequences that look like:

AAACTGACACCATCGATCAGAACCTGA

So in that string once we find ACTGAC then I need to analyze the next 10 characters for the string repeats which go by some rules. I have the rules coded but can anyone show me how once I find the string that I need, I can make a substring for the next ten characters to analyze. I know that str.partition function can do that once I find the string, and then the [1:10] can get the next ten characters.

Thanks!

dhillonv10
  • 143
  • 1
  • 8

2 Answers2

4

You almost have it already (but note that indexes start counting from zero in Python).

The partition method will split a string into head, separator, tail, based on the first occurence of separator.

So you just need to take a slice of the first ten characters of the tail:

>>> data = 'AAACTGACACCATCGATCAGAACCTGA'
>>> head, sep, tail = data.partition('ACTGAC')
>>> tail[:10]
'ACCATCGATC'

Python allows you to leave out the start-index in slices (in defaults to zero - the start of the string), and also the end-index (it defaults to the length of the string).

Note that you could also do the whole operation in one line, like this:

>>> data.partition('ACTGAC')[2][:10]
'ACCATCGATC'
ekhumoro
  • 115,249
  • 20
  • 229
  • 336
  • Just bear in mind that string.partition() "Split the string at the **first** occurrence of sep". If there are multiple cases of separator or overlapping separators, have a look here: http://stackoverflow.com/questions/4664850/find-all-occurrences-of-a-substring-in-python – HongboZhu Apr 10 '12 at 11:45
0

So, based on marcog's answer in Find all occurrences of a substring in Python , I propose:

>>> import re
>>> data = 'AAACTGACACCATCGATCAGAACCTGAACTGACTGACAAA'
>>> sep = 'ACTGAC'
>>> [data[m.start()+len(sep):][:10] for m in re.finditer('(?=%s)'%sep, data)]
['ACCATCGATC', 'TGACAAA', 'AAA']
Community
  • 1
  • 1
HongboZhu
  • 4,442
  • 3
  • 27
  • 33