0

I'd like to use a loop to change the function applied to a DataFrame and name the output in python

For example, I would like to calculate the mean, max,sum, min, etc of the same DataFrame and I'd like to use a loop to cycle through these and name the output.

Say i have a DataFrame df ...

numbs = [[ 1,2,4],[34,5,6],[22,4,5]]
df = pd.DataFrame(numbs,columns=['A','B','C'])  

I want to use this calcs dict to define the function applied to df and name the output, like this

calcs = {'sum','mean','max'}
for i in calcs:
    ('df'+ i) = df.i

And I was looking for output like

dfsum
A 57
B 11
C 15

dfmean
A  19.000
B  3.667
C  5.000 

etc
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40

3 Answers3

4

You can use agg with a list of functions:

numbs = [[ 1,2,4],[34,5,6],[22,4,5]]
df = pd.DataFrame(numbs,columns=['A','B','C']) 

df_out = df.agg(['mean','max','min'])

print(df_out.loc['mean'])
print(df_out.loc['max'])
print(df_out.loc['min'])

You can access each series in the dataframe using index selection with loc.

Output:

A    19.000000
B     3.666667
C     5.000000
Name: mean, dtype: float64
A    34.0
B     5.0
C     6.0
Name: max, dtype: float64
A    1.0
B    2.0
C    4.0
Name: min, dtype: float64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

Since you use set calcs, you may use agg directly on it as follows:

calcs = {'sum','mean','max'}

df.agg(calcs).T.add_prefix('df')

Out[922]:
   dfmax  dfsum     dfmean
A   34.0   57.0  19.000000
B    5.0   11.0   3.666667
C    6.0   15.0   5.000000
Andy L.
  • 24,909
  • 4
  • 17
  • 29
0

Without strings, you can do simply:

calcs = {pd.DataFrame.sum, pd.DataFrame.mean, pd.DataFrame.max}
# or even with the builtins: {sum, pd.DataFrame.mean, max}
for calc in calcs:
    df.apply(calc)

If you require to use strings, then use the builtin getattr:

calcs = {'sum', 'mean', 'max')
for calc in calcs:
    getattr(df, calc)()
David Zemens
  • 53,033
  • 11
  • 81
  • 130