1

Ok so i'll explain what i have and need to do.

I have a dict with multiple keys but i'll give one for this example and a string made up of A, T, C and G. (the [3:] indicating it should split on the 3 index of the value)

Dict = {'EcoRV': 'GATATC'[3:]} 
String = 'AAAAGATATCAAAGATATCAAAA'

Now here is what i need to do. I have to look for the value in the string and split the string on the value of the dict key and end up with a list containing the part where it split, so in this case it would have to split on 'TC' and take the first C and split it and end up with:

List = ['AAAAGATA','TCAAAGATA','TCAAAA']

I tried it with split but it loses the 'TC' and I need to keep those.

  • 1
    Possible duplicate of [In Python, how do I split a string and keep the separators?](http://stackoverflow.com/questions/2136556/in-python-how-do-i-split-a-string-and-keep-the-separators) – Pit Mar 16 '17 at 10:04
  • 1
    @Pit I do not see the connection. – Ma0 Mar 16 '17 at 10:06
  • What is that `[1]` doing there on the dict? – Ma0 Mar 16 '17 at 10:10
  • @Ev.Kounis I glossed over the resulting list, this is in fact not a duplicate of the answer I linked. Sorry! – Pit Mar 16 '17 at 10:11
  • @Ev.Kounis It's an indication for where it should split. It should look for the sequence of letters TC and split between the T and C so the [1] indicates that. – Nathan Weesie Mar 16 '17 at 10:14
  • @NathanWeesie What the `[1]` does is that it indexes your `"TC"` string and converts it to `"C"`. So remove it. – Ma0 Mar 16 '17 at 10:15
  • @NathanWeesie your edited example still won't do what you want it to - Python will evaluate `'GATATC'[3:]` to `'ATC'`, and then you lose the preceding data. I've provided an example dictionary structure in my answer that might work for you. – asongtoruin Mar 16 '17 at 10:37

3 Answers3

5

You've seen how split can work - how about you add an arbitrary character that you won't find in your string between the letters you want to separate, and split on that:

test_str = 'AAATTTCCCGGGTCGGGAAA'
print test_str.replace('TC', 'T:C').split(':')

prints ['AAATTT', 'CCCGGGT', 'CGGGAAA'].

If you want to extend this further using your dictionary, you can change the replace parameters to use your dictionary values with string formatting. For example:

temp_dict = {'Testenzyme': 'TC',
             'Asongtoruinzine': 'GA'}

test_str = 'AAATTTCCCGGGTCGGGAAA'

out_dict = dict()

for key, val in temp_dict.items():
    out_dict[key] = test_str.replace(val, '{}:{}'.format(val[0], val[1])).split(':')

print out_dict

prints {'Asongtoruinzine': ['AAATTTCCCGGGTCGGG', 'AAA'], 'Testenzyme': ['AAATTT', 'CCCGGGT', 'CGGGAAA']}

EDIT: Reading the comments I see you want to specify where to split the string in the dictionary values. It would be easier if you wrote your dictionary values as two-element lists, where the two elements represented the different parts of the string you wanted to split. For example, you could then do the following:

temp_dict = {'Testenzyme': ['T', 'C'],
             'Asongtoruinzine': ['GT', 'C']}

test_str = 'AAATTTCCCGGGTCGGGAAA'

out_dict = dict()

for key, val in temp_dict.items():
    out_dict[key] = test_str.replace(''.join(val), ':'.join(val)).split(':')

print out_dict
asongtoruin
  • 9,794
  • 3
  • 36
  • 47
0

You can use regex:

enzyme= 'TC'
String = 'AAATTTCCCGGGTCGGGAAA'

import re

#with re.split:
print(list( filter(bool, re.split(r'(.*?{})(?={})'.format(enzyme[0], enzyme[1]), String)) ))

#alternative with re.findall:
print( re.findall(r'.*?{}(?={})|.+$'.format(enzyme[0], enzyme[1]), String) )
Graham
  • 7,431
  • 18
  • 59
  • 84
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
0
import re
Dict = {'Testenzyme':'TC'}
String = 'AAATTTCCCGGGTCGGGAAA'
TestEnzyme = Dict['Testenzyme']
String.replace(TestEnzyme , re.sub(r'(\w)(\w)', r'\1:\2', TestEnzyme )).split(":")

should do the job

Tomer Lev
  • 1
  • 2