How to get the position of each character in string with python?

Question

How can I get the position of a matched characters(small string) inside a string(fasta) in python?

I am using a fasta file as String to search for a motif using regular expression '[AGCT][TG][TC][GT]TG' along with the motif, I also wish to know and save the position of motif occurred in the string.

rdict = dict([ (x[1],x[0]) for x in enumerate(Seq) ])
motif = '[AGCT][TG][TC][GT]TG'
#for match in Seq:
matches = re.findall(motif, Seq.upper())
print(matches)
Seq.index(matches)

The above code does the work to search for the motif but returns only position of one character. How can I change this to give the start to end position of the motif(small string).

If you know the position of 1 character, you also know the length of the match is 6 so what can't you do ? — , Aug 01 '19 at 16:51
Maybe `matches = [x.span() for x in re.finditer(motif, Seq.upper())]`? — Wiktor Stribiżew, Aug 01 '19 at 16:56
`iter = re.finditer(motif,Seq.upper()) indices = [m.start(0) for m in `iter] — , Aug 01 '19 at 16:57
See https://stackoverflow.com/questions/2674391/python-locating-the-position-of-a-regex-match-in-a-string/16360404 to get some ideas on how you can do this. — , Aug 01 '19 at 16:59
Yes, please let know if https://stackoverflow.com/a/16360404/3832970 answers your question. — Wiktor Stribiżew, Aug 01 '19 at 16:59
Its basically calling the `start()` function of the match object. You have access to the matched substring and its position. Create your arrays, maybe an array of array's. — , Aug 01 '19 at 17:02
@sln .. thanks for the links but findall is the only option works with fasta sequences so far, I had tried finditer and re.search but they have issues with list of strings. — Kay, Aug 01 '19 at 17:58
@WiktorStribiżew .. thanks but re.search isn't good with lists. — Kay, Aug 01 '19 at 18:00
Also, I tried something as ```binding = [] index = [] #print(matches) for match in Seq: matches = re.findall(motif, Seq.upper()) for char in matches: pos = Seq.index(matches[0]) if len(matches) > 0: dataframe = pd.DataFrame({'index':pos, 'binding':matches }) binding.append(matches) index.append(pos) print(len(matches)) dataframe.head()``` but the second loop with index is stuck at first position, any suggestions? — Kay, Aug 01 '19 at 18:00
@Kay *re.search isn't good with lists* - I have nowhere advised to use `re.search`. What is your exact input? What is your exact expected output ? — Wiktor Stribiżew, Aug 01 '19 at 20:02
@WiktorStribiżew @WiktorStribiżew input is fasta sequences that looks like this ```Seq=GGAGGGAGAAGCAGCCTGAACCGGGCTGGTCTCTCTGGGATTGGAGAGAAAGGTGGCGGAGaGCGGCGGGGGTGGGGGG``` and expected output is ```+------+-------+---------+ | | start | binding | +------+-------+---------+ . | 0 | 210 | GGCTTG | . | 1 | 317 | TTTTTG | . | 2 | 389 | GGCGTG | . | .... | .. | .... | . | .... | .. | .... | . | 3 | 810 | CGCGTG | . | 4 | 810 | CTCTTG | . +------+-------+---------+ . ``` — Kay, Aug 01 '19 at 21:47

DjaouadNM · Answer 1 · 2019-08-01T21:24:31.313

0

For multiple matches along with their start and end indices, use finditer instead:

matches = re.finditer(motif, Seq.upper())

for match in matches:
  string_matched = match[0]
  start_index = match.start(0)
  end_index = match.end(0)

edited Aug 01 '19 at 21:24

answered Aug 01 '19 at 16:57

DjaouadNM

22,013
4
33
55

Thanks ! but its throws error as ```ValueError: If using all scalar values, you must pass an index``` – Kay Aug 01 '19 at 17:50
@Kay That's a pandas error, you didn't mention how you're using the above in a dataframe. – DjaouadNM Aug 01 '19 at 21:26
```binding.append(string_matched) start.append(start_index) end.append(end_index) dataframe = pd.DataFrame({ 'binding':binding, 'start':start, 'end':end}) dataframe.head()``` – Kay Aug 01 '19 at 21:33
Thanks for above, I 'm creating a list of matches and indices and then put them together in dataframe. – Kay Aug 01 '19 at 21:35

How to get the position of each character in string with python?

1 Answers1