This is for a homework assignment, I solved the problem but I'm trying to find a faster solution.
The problem is as follows: I need to figure out how many possible aminoacid (aa) sequences exist that have a total mass m. I have a table of aminoacids (single letter strings) and their corresponding mass (int) which I put in a dictionary.
My initial solution was to create all possible combinations of aa and compare each combination's total mass to the mass m. This works for small numbers of m but the number of combinations gets ridiculously high when m starts being in the hundreds.
I did some minor optimization and got it to work reasonably fast for m < 500 which is good enough for this problem but I want to know how to make it work for higher masses.
This is what I have so far:
totalmass = m
def pepList():
tempList = ['']
temp2List = []
length = 0
total = 0
aminoList = 'GASPVTCINDKEMHFRYW' #this are all the aminoacids
while length < maxLength:
for i in tempList:
for j in aminoList:
pepMass = peptideMass(i+j, massTable) #find the mass of
#this peptide
if pepMass == totalmass:
total += 1
elif pepMass <= totalmass:
temp2List.append(i+j)
tempList = []
for i in temp2List:
tempList.append(i)
temp2List = []
length = length + 1
print (total)
pepList()
I can get a solution for m = 300 in about a second but m = 500 takes about 40 seconds
I tried an alternative using itertools but it wasn't any faster:
total = 0
pepList = []
for i in range(maxLength+1):
for p in itertools.combinations_with_replacement(aminoList, i):
#order matters for the total number of peptides but not for calculating
#the total mass
amino = ''.join(p)
if peptideMass(amino, massTable) == mass:
pepList.append(amino)
print (len(pepList))
newpepList = []
for i in pepList:
for p in itertools.permutations(i, r = len(i)):
#I use permutations here to get the total number because order matters
if p not in newpepList:
newpepList.append(p)
total +=1
print (total)
Sample input: m = 270 output: 22