0

I have the following data in pandas df:

quantity    color

1           White
3           White
1           Black
2           Black

I need to group duplicates and get the sum of their quantities:

quantity    color

4           White
3           Black

I have tried the following code to first sort by quantity, then using groupby to combine duplicates and get the sum.

df1 = df.sort_values(by=['quantity','color'], ascending=False)
df2 = df1.groupby(['color']).sum()
print(df2)

I'm getting the following output:

White    13
Black    12

As you can see it's not giving me the sum of the quantity, it's just listing those values side by side in the next column (1,3) and (1,2) for each color.

Bronson77
  • 251
  • 2
  • 3
  • 11

1 Answers1

1

I believe there must be empty spaces in the example you are giving and that's the reason for why astype(int) isn't working. Using pd.to_numeric and passing errors='coerce' will make sure we convert all string values to np.Nan which can be easily dealt with when using groupby

import pandas as pd
a = {'quantity':[1,3,1,2],'color':['White','White','Black','Black']}
df = pd.DataFrame(a)
df['quantity'] = pd.to_numeric(df['quantity'],errors='coerce')
print(df.groupby('color')['quantity'].sum())

Output:

color
Black    3
White    4
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53