I need to make a frequency dictionary from a pandas series (from the 'amino_acid' column in dataframe below) that also adds an adjacent row for each entry in the dictionary (from 'templates' column).
templates amino_acid
0 118 CAWSVGQYSNQPQHF
1 635 CASSLRGNQPQHF
2 468 CASSHGTAYEQYF
3 239 CASSLDRLSSGEQYF
4 51 CSVEDGPRGTQYF
My current approach of iterating through the dataframe seems to be inefficient and even an anti-pattern according to this post. How can I improve the efficiency/use best practice for doing this?
My current approach:
sequence_counts = {}
seqs = list(zip(df.amino_acid, df.templates))
for seq in seqs:
if seq[0] not in sequence_counts:
sequence_counts[seq[0]] = 0
sequence_counts[seq[0]] += seq[1]
I've seen people the below way, but can't figure out how to adjust it to add each respective 'templates' entry:
sequence_counts = df['amino_acid'].value_counts().to_dict()
Any help/feedback would be greatly appreciated! :)