2

I have this pandas DataFrame

                              val
datetime   attribute_id          
2018-01-31 0             4.162565
           1             3.305480
           2             3.191123
           3             3.601398
           4             3.277375
           6             3.556552
2018-02-28 0             0.593762
           1             0.594565
           2             0.583355
           3             0.611113
           4             0.577600
           6             0.638904

And I would like to have a column ratio where for each month each attribute is divided by the mean of all other attributes.

For example, for datetime = 2018-01-31, which refers to the month of January, I would like the column ratio to contain the value of attribute 0 (4.162565) divided by the mean of attributes 1,2,3,4 and 6 which is the mean of 3.305480, 3.191123, 3.601398, 3.277375 and 3.556552. This month-wise for each attribute.

datetime and attribute_id are a MultiIndex.

Does someone know how to do this?

Marco
  • 1,454
  • 1
  • 16
  • 30

1 Answers1

1

You can count mean per first MultiIndex level with GroupBy.transform and create new Series with same size like original DataFrame and dnen divide column by Series.div:

print (df.groupby(level=0)['val'].transform('mean'))
datetime    attribute_id
2018-01-31  0               3.515749
            1               3.515749
            2               3.515749
            3               3.515749
            4               3.515749
            6               3.515749
2018-02-28  0               0.599883
            1               0.599883
            2               0.599883
            3               0.599883
            4               0.599883
            6               0.599883
Name: val, dtype: float64

df['result'] = df['val'].div(df.groupby(level=0)['val'].transform('mean'))
print (df)
                              val    result
datetime   attribute_id                    
2018-01-31 0             4.162565  1.183977
           1             3.305480  0.940192
           2             3.191123  0.907665
           3             3.601398  1.024362
           4             3.277375  0.932198
           6             3.556552  1.011606
2018-02-28 0             0.593762  0.989796
           1             0.594565  0.991135
           2             0.583355  0.972448
           3             0.611113  1.018720
           4             0.577600  0.962854
           6             0.638904  1.065047

If need exlude correct row only change groupby(level=0) in this unutbu solution:

grouped = df.groupby(level=0)
n = grouped['val'].transform('count')
mean = grouped['val'].transform('mean')
df['ratio'] = df['val'] / ((mean*n - df['val']) / (n-1))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi, thanks for you answer, but I don't want to divide by the mean of all attributes in the given month, but each attribute should by divided by the mean of all *others*. For example attribute 0 should be divided by the mean of attributes 1,2,3,4 and 6 – Marco Jun 20 '19 at 10:25
  • 1
    Thanks! Just change the last line of code to `df['ratio'] = df['val'] / ((mean*n - df['val']) / (n-1))` to give the correct result (I wanted the row divided by the mean and with your code you just get the mean) – Marco Jun 20 '19 at 10:36
  • @Marco - Answer was changed. – jezrael Jun 20 '19 at 10:59