I am trying to count the frequency of elements in a column of a pandas DataFrame.
Some toy data:
d = pd.DataFrame({'letters':[['a', 'b', 'c'], np.nan, ['a', 'e', 'd', 'c'], ['a', 'e', 'c']]})
What I can come up with is to loop through the rows and add values to a dictionary:
letter_count = {}
for i in range(len(d)):
if d.iloc[i, ]['letters'] is np.nan:
continue
else:
for letter in d.iloc[i, ]['letters']:
letter_count[letter] = letter_count.get(letter, 0) + 1
This worked for me, except it was not very fast since my dataset was big. I assume by avoiding the explicit for-loop may help, but I cannot come up with a more 'pandasian' way to do this.
Any help is appreciated.