0

First of all sorry if the title is not very self-explanatory, but I didn't quite know how to put it (or search for it).

I have a DataFrame (df) that looks something like:

    Num Let
0     1   a
1     2   b
2     3   c
3     4   d
4     1   a
5     2   b
6     1   a

My intention is to create a new df (df_new) with one column containing only one instance of the df['Let'] column, and another one with the sum of their values in df['Num'].

What I've works fine, but I fill it could be done in a more pythonic way. Moreover, I need to apply this code to a large dataset, several times. The result (df_new) should be:

       Let  Count  Sum
0       a      3    3
1       b      2    4
2       d      1    4
3       c      1    3

My code:

# Create df_new
df_new = df['Let'].value_counts().rename_axis('Letters').reset_index(name='Count')

# Find indexes of each instance in df['Let']
def find_indexes(entry):
    x = df.index[df['Let'] == entry].tolist()
    return x

indexes = [find_indexes(x) for x in a]

# Transform into dictionary
pairs = list(zip(a, indexes))
my_dict = dict(pairs)

def find_sum(entry):
    index_list = my_dict[entry] # Get the proper index list
    nums = []
    for i in index_list: 
        x = df['Nums'][i]
        nums.append(x)
    return sum(nums)            

df_new['Sum'] = [find_sum(x) for x in df_new['Letters']]

Thanks in advance!

Sheri
  • 1,383
  • 3
  • 10
  • 26
martifapa
  • 107
  • 3
  • 12

1 Answers1

1

You can try groupby followed by an aggregation using agg:

df.groupby("Let").agg(["count", "sum"])

output

#     count sum
# Let
# a       3   3
# b       2   4
# c       1   3
# d       1   4

To reset "Let" index, use reset_index:

df.groupby("Let")["Num"].agg(["count", "sum"]).reset_index()
#   Let  count  sum
# 0   a      3    3
# 1   b      2    4
# 2   c      1    3
# 3   d      1    4

Note: here, I specify ["Num"] after the groupby to remove the multi level columns.

Another way to write it is:

df.groupby("Let").agg(count=("Num": lambda x: x.count()),
                      sum=("Num": lambda x: x.sum())) \
                 .reset_index()
Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
  • 1
    Hey Alexandre B., thanks! I see this works perfectly fine when the agg() is applied to one column, but if you try it on more than one column there's no way to avoid the multi-level columns. Anyway, this sure reduces my code! – martifapa May 08 '20 at 14:57
  • Maybe the last way I added might help. Anyway, multi column can be flatten. This [discussion](https://stackoverflow.com/questions/14507794/pandas-how-to-flatten-a-hierarchical-index-in-columns) might help if needed :-) – Alexandre B. May 08 '20 at 15:03