Hi,
Noobie to python here.
I have >10,000 strings that represent peptide sequences. Each letter in the string is an amino acid and I would like to calculate the "net sum" of the string after I have replaced each letter with a pre-defined float value (ranging from -1 to -2).
I am stuck on where to start with the loop to make this work? I have the code to clean the strings so that non-alphabetical characters are removed and replace with float values defined in a dictionary (i.e. W:2.10, G:-1.0)
cleaned peptides, truncated to 5 characters
I imagine the code is something like.
I have 6 dataframes to repeat this process in.
Any help would be immensely appreciated!
Updated Code (THIS WORKS THANKS TO SARAH MESSER)
def hydrophobicity_score(peptide):
hydro = {
'A': -0.5,
'C': -1.0,
'D': 3.0,
'E': 3.0,
'F': -2.5,
'G': 0.0,
'H': -0.5,
'I': -1.8,
'K': 3.0,
'L': -1.8,
'M': -1.3,
'N': 0.2,
'P': 0.0,
'Q': 0.2,
'R': 3.0,
'S': 0.3,
'T': -0.4,
'V': -1.5,
'W': -3.4,
'Y': -2.3,
}
hydro_score = [hydro.get(aa,0.0)for aa in peptide]
return sum(hydro_score)
og_pep['Hydro'] = og_pep['Peptide'].apply(hydrophobicity_score)
og_pep