1

I have an extremely large pandas DataFrame with multiple rows for a single value 'Standard' under column 'account_type' but different numeric values for different rows in other column headers.

Is there a way to combine all numeric values for 'Standard' without combining the string for every row? I have 180 columns that I need this done for.

Example:

df = pd.DataFrame([
['Standard', 0.2],
['Standard', 0.3],
['Standard', 0.2],
['Standard', 0.4],
['Standard', 0.6],
['Standard', 0.3]], 
columns=['account_type',  'cost'])

Only want:

account_type   cost
'Standard'     2.0   

Minimal coding experience so apologies if not clear.

ayhan
  • 70,170
  • 20
  • 182
  • 203
geds133
  • 1,503
  • 5
  • 20
  • 52
  • 1
    `df.groupby('account_type', as_index=False).sum()` – ALollz Oct 30 '18 at 14:20
  • Didn't mention but I have around 180 columns so don't want to type out each column name in the code. Just want to base everything around account_type and have all 180 columns combine. – geds133 Oct 30 '18 at 14:26
  • Do you need to combine all rows for `Standard` preserving the number of columns (sum all rows in a given column)? Or do you also need to collapse all of the columns and rows into a single number for Standard? – ALollz Oct 30 '18 at 14:29
  • Combine all rows for 'Standard' preserving number of columns and combining numeric values for all other columns @ALollz – geds133 Oct 30 '18 at 14:32
  • @ALollz - Solution is not groupby, dupe is wrong. – jezrael Oct 30 '18 at 14:45
  • @geds133 - added solution for sum of all numeric columns. – jezrael Oct 30 '18 at 14:46
  • @jezrael. Sure then remove the dupe. The provided example just didn't match the actual problem. – ALollz Oct 30 '18 at 14:50
  • @geds133 - So need filter only `Standard` rows and `sum` all numeric columns? – jezrael Oct 30 '18 at 14:51
  • Sure. Many Thanks @jezrael – geds133 Oct 30 '18 at 15:05

1 Answers1

3

Filter only Standard rows by boolean indexing and for new DataFrame use constructor:

a = df.loc[df['account_type'] == 'Standard', 'cost'].sum()
print (a)
2.0

df = pd.DataFrame([['Standard', a]], columns=['account_type',  'cost'])
print (df)
  account_type  cost
0     Standard   2.0

If all values are Standard:

df = pd.DataFrame([['Standard', df['cost'].sum()]], columns=['account_type',  'cost'])

And if want all possible acount_type values is possible aggregate sum:

df = pd.DataFrame([
['Standard1', 0.2],
['Standard1', 0.3],
['Standard1', 0.2],
['Standard2', 0.4],
['Standard2', 0.6],
['Standard', 0.3]], columns=['account_type',  'cost'])

print (df)
  account_type  cost
0    Standard1   0.2
1    Standard1   0.3
2    Standard1   0.2
3    Standard2   0.4
4    Standard2   0.6
5     Standard   0.3

df1 = df.groupby('account_type', as_index=False)['cost'].sum()
print (df1)
  account_type  cost
0     Standard   0.3
1    Standard1   0.7
2    Standard2   1.0

EDIT:

If need sum of all numeric columns:

df = pd.DataFrame({
         'account_type':['Standard'] * 5 + ['another val'],
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df)
  account_type  B  C  D  E  F
0     Standard  4  7  1  5  a
1     Standard  5  8  3  3  a
2     Standard  4  9  5  6  a
3     Standard  5  4  7  9  b
4     Standard  5  2  1  2  b
5  another val  4  3  0  4  b

cols = df.select_dtypes(np.number).columns
s = df.loc[df['account_type'] == 'Standard', cols].sum()
print (s)
B    23
C    30
D    17
E    25
dtype: int64

df1 = s.to_frame().T
df1.insert(0, 'account_type', 'Standard')
print (df1)
  account_type   B   C   D   E
0     Standard  23  30  17  25
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252