2

I have a dataframe with multiple columns

df = pd.DataFrame({"cylinders":[2,2,1,1],
                  "horsepower":[120,100,89,70],
                  "weight":[5400,6200,7200,1200]})


 cylinders horsepower weight
0  2          120       5400
1  2          100       6200 
2  1           80       7200
3  1           70       1200

i would like to create a new dataframe and make two subcolumns of weight with the median and mean while gouping it by cylinders. example:

                        weight
  cylinders horsepower  median  mean
0  1          100       5299    5000
1  1          120       5100    5200
2  2           70       7200    6500
3  2           80       1200    1000

For my example tables i have used random values. I cant manage to achieve that. I know how to get median and mean its described here in this stackoverflow question. :

df.weight.median()
df.weight.mean()
df.groupby('cylinders') #groupby cylinders

But how to create this subcolumn?

Khan
  • 1,418
  • 1
  • 25
  • 49

1 Answers1

3

The following code fragment adds the two requested columns. It groups the rows by cylinders, calculates the mean and median of weight, and combines the original dataframe and the result:

result = df.join(df.groupby('cylinders')['weight']\
           .agg(['mean', 'median']))\
           .sort_values(['cylinders', 'mean']).ffill()
#   cylinders  horsepower  weight    mean  median
#2          1          80    7200  5800.0  5800.0
#3          1          70    1200  5800.0  5800.0
#1          2         100    6200  4200.0  4200.0
#0          2         120    5400  4200.0  4200.0

You cannot have "subcolumns" for select columns in pandas. If a column has "subcolumns," all other columns must have "subcolumns," too. It is called multiindexing.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • can i have this structure through multiindexing, where i have weight on top and median and mean under it? – Khan Jan 12 '19 at 20:55
  • Yes you can, but then you must have the second-level index for cylinders and horsepower, too. It may be easier to call the columns 'mean_weight' and 'median_weight'. – DYZ Jan 12 '19 at 20:56