0
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(10, 3)),
                  columns=['price', 'created_year', 'price_per_cm'],
                  index=range(1,11))
>>> df
    price  created_year  price_per_cm   artist
1       9             5             4    degas
2       4             0             8    degas
3       2             5             1   renoir
4       0             0             1  picasso
5       9             0             7   renoir
6       5             0             1    degas
7       6             5             8  picasso
8       9             5             3  picasso
9       0             9             7    degas
10      0             5             9  picasso

I want to group by artist and apply different functions to some columns, i.e. mean() to 'price' and max() to 'created_year'. This is how I achieved this:

s1 = df.groupby(['artist'])['price'].mean()
s2 = df.groupby(['artist'])['created_year'].max()
df2 = pd.concat([s1, s2], axis=1)
         price  created_year
>>> df2
         price  created_year
artist
degas     4.50             9
picasso   3.75             5
renoir    5.50             5

Is there a more direct way to get to this point instead of generating two series and concatenating them again to a dataframe?

Zin Yosrim
  • 1,602
  • 1
  • 22
  • 40

0 Answers0