First of all sorry if the title is not very self-explanatory, but I didn't quite know how to put it (or search for it).
I have a DataFrame (df) that looks something like:
Num Let
0 1 a
1 2 b
2 3 c
3 4 d
4 1 a
5 2 b
6 1 a
My intention is to create a new df (df_new) with one column containing only one instance of the df['Let'] column, and another one with the sum of their values in df['Num'].
What I've works fine, but I fill it could be done in a more pythonic way. Moreover, I need to apply this code to a large dataset, several times. The result (df_new) should be:
Let Count Sum
0 a 3 3
1 b 2 4
2 d 1 4
3 c 1 3
My code:
# Create df_new
df_new = df['Let'].value_counts().rename_axis('Letters').reset_index(name='Count')
# Find indexes of each instance in df['Let']
def find_indexes(entry):
x = df.index[df['Let'] == entry].tolist()
return x
indexes = [find_indexes(x) for x in a]
# Transform into dictionary
pairs = list(zip(a, indexes))
my_dict = dict(pairs)
def find_sum(entry):
index_list = my_dict[entry] # Get the proper index list
nums = []
for i in index_list:
x = df['Nums'][i]
nums.append(x)
return sum(nums)
df_new['Sum'] = [find_sum(x) for x in df_new['Letters']]
Thanks in advance!