-1

I am relatively new to python and pandas and I am trying to perform a group-wise operation using apply but struggle to get it working.

My data frame looks like this:

Year    Country Val1    Val2    Fact
2005    A   1   3   1
2006    A   2   4   2
2007    A   3   5   2
2008    A   4   3   1
2009    A   4   3   1
2010    A   4   3   1
2005    B   5   7   2
2006    B   6   6   2
2007    B   7   5   1
2008    B   8   6   2
2009    B   8   6   2
2010    B   8   6   2

For each country in each year, I need to calculate (country mean for period 2005-2008 - value in 2005)/4 * Fact * (Year - 2005) + value in 2005

So far I read up on the use of apply and transform and looked at questions related to the use of both functions (e.g.1 and 2) and I thought that my problem can be solved by using a group wise apply.

I tried to set it up like so:

import pandas as pd

df = pd.DataFrame({'Year' : [2005, 2006, 2007, 2008, 2009, 2010, 2005, 2006, 2007, 2008, 2009, 2010],
                'Country' : ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B'],
                'Val1' : [1, 2, 3, 4, 4, 4, 5, 6, 7, 8, 8, 8],
                'Val2' : [3, 4, 5, 3, 3, 3, 7, 6, 5, 6, 6, 6,],
                'Fact' : [1, 2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2]
                    })

def func(grp):
    grad = grp[(grp['Year'] > 2004) & (grp['Year'] < 2009)].transform('mean')
    ref = grp[grp['Year'] == 2005]
    grad = (grad - ref)/4
    res = grad * grp['Fact'] * (grp['Year']-2015) * ref
    return res

df.groupby('Country').apply(func)

Running the code yields

        Country Fact    Val1    Val2    Year    0   1   2   3   4   5   6   7   8   9   10  11
Country                                                                     
A   0   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
B   6   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

However, I hoped to receive something along the line of this

Year  Country Val1    Val2    Fact
2005  A       1       3       1
2006  A       1.75    3.375   2
2007  A       2.5     3.75    2
2008  A       2.125   3.5625  1
2009  A       2.125   3.5625  1
2010  A       2.125   3.5625  1
2005  B       5       7       2
2006  B       5.75    6.5     2
2007  B       5.75    6.5     1
2008  B       7.25    5.5     2
2009  B       7.25    5.5     2
2010  B       7.25    5.5     2

I would be very grateful if anybody could point me towards a solution for this problem.

Ankit Agrawal
  • 616
  • 9
  • 20

1 Answers1

0

It is better not do it within one function

s1=df.loc[df.Year.between(2005,2008)].groupby('Country').mean()[['Val1','Val2']]
s2=df.loc[df.Year.eq(2005),['Country','Val1','Val2']].set_index('Country')
s3=df.Year.sub(2005)*df.Fact
s=(s1-s2).div(4).reindex(df.Country).values*s3.values[:,None]+s2.reindex(df.Country).values
df.loc[:,['Val1','Val2']]=s
df
    Year Country   Val1    Val2  Fact
0   2005       A  1.000  3.0000     1
1   2006       A  1.750  3.3750     2
2   2007       A  2.500  3.7500     2
3   2008       A  2.125  3.5625     1
4   2009       A  2.500  3.7500     1
5   2010       A  2.875  3.9375     1
6   2005       B  5.000  7.0000     2
7   2006       B  5.750  6.5000     2
8   2007       B  5.750  6.5000     1
9   2008       B  7.250  5.5000     2
10  2009       B  8.000  5.0000     2
11  2010       B  8.750  4.5000     2
BENY
  • 317,841
  • 20
  • 164
  • 234