I have a string and some characters of the string can have some alternative 'weights' as follows: A S [27.0, 0] D S [27.0, 0] N [1.0, -20.0, 0] P So, for the string 'ASDSNP', I've weights 27.0 and 0.0 for each S and 1.0, -20.0 and 0.0 for each N etc.
I'm generating all the possible combination of the weights for a given string and getting their sum, so for each string I'can have multiple weight sums depending on the weight combinations and I'm populating a dict with different strings and their sum of weights.
I'm generating many sub strings from a long string following some rule and generating the sum by this code, which is taking quite long time and I'm controlling the explosion of combinations keeping only a small number of combinations, max_totalcomb = 2
Can somebody suggest any method to make this run fast? Thanks in advance
import itertools as it
import time
start_time=time.time()
def generate_substr(seq=None):
for f in range(1,len(seq)):
for g in range(1,len(seq)):
if g >f :
yield seq[f:g]
def wt_combinator(anystr=None,max_totalcomb = 2):
'''
Returns sum of a list of combinatorial modification weights
'''
wt_list=[]
anystr = anystr.upper()
wt_sites=char_wt.keys()
for va in anystr:
if va in wt_sites:
wt_list.append(char_wt[va])
wt_combination=list(it.product(*wt_list))
filtered_combination=[e for e in wt_combination if 0 <= sum([int(bool(x)) for x in e]) <= max_totalcomb ]
sumwt_list=[sum(cmb) for cmb in filtered_combination]
if len(sumwt_list):
return sumwt_list
else:
return [0]
char_wt= { 'N': [1.0, -20.0, 0],
'Q': [1.0, 0],
'S': [27.0, 0],
'T': [27.0, 0],
'Y': [27.0, 0]}
my_string = "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG"
str_db = dict()
dbcounter = 0
for fseq in generate_substr(my_string):
swt = wt_combinator(fseq)
for w in swt:
str_db[dbcounter]={'seq':fseq,'sumw':w}
dbcounter +=1
end_time=time.time()
print(end_time - start_time)
print(len(str_db))