1

I have a data frame as shown below.

Place        Bldng_Id    Num_Bed_Rooms     Contract_date   Rental_value
Bangalore    1           4                 2016-02-16      100
Bangalore    1           4                 2016-05-16      150
Bangalore    1           4                 2017-01-18      450
Bangalore    1           4                 2017-02-26      550
Bangalore    5           4                 2015-02-26      120
Bangalore    5           4                 2016-05-18      180
Bangalore    2           3                 2015-03-06      150
Bangalore    2           3                 2016-05-14      150
Bangalore    2           3                 2017-07-26      220
Bangalore    2           3                 2017-09-19      200
Chennai      3           4                 2016-02-16      100
Chennai      3           4                 2016-05-16      150
Chennai      3           4                 2017-01-18      450
Chennai      3           4                 2017-02-26      550
Chennai      4           3                 2015-03-06      150
Chennai      4           3                 2016-05-14      150
Chennai      4           3                 2017-07-26      220
Chennai      4           3                 2017-09-19      200
Chennai      6           3                 2018-07-26      250
Chennai      6           3                 2019-09-19      280

From the above I would like to prepare the below dataframe.

Expected output:

Place          Num_Bed_Rooms     Year            Avg_Rental_value
Bangalore      3                 2015            150
Bangalore      3                 2016            150
Bangalore      3                 2017            210
Bangalore      4                 2015            120
Bangalore      4                 2016            143.3
Bangalore      4                 2017            500
Chennai        3                 2015            150
Chennai        3                 2016            150
Chennai        3                 2017            210
Chennai        3                 2018            250
Chennai        3                 2019            280
Chennai        4                 2016            150
Chennai        4                 2017            210

I tried following code to achieve this.

df.groupby(['Place', 'Year', 'Num_Bed_Rooms']).Rental_value.mean()

But above does not work properly.

From the above expected output I would like to write a time series code to forecast the next year rental_value for each case separatly.

Danish
  • 2,719
  • 17
  • 32

1 Answers1

1

If necessary first convert values to datetimes:

df['Contract_date'] = pd.to_datetime(df['Contract_date'])

Then create new column and pass to groupby:

df['Year'] = df['Contract_date'].dt.year
df1 = df.groupby(['Place', 'Num_Bed_Rooms','Year'], as_index=False).Rental_value.mean()

Or pass Series:

y = df['Contract_date'].dt.year.rename('Year')
df1 = df.groupby(['Place', 'Num_Bed_Rooms', y], as_index=False).Rental_value.mean()

print (df1)
        Place  Num_Bed_Rooms  Year  Rental_value
0   Bangalore              3  2015    150.000000
1   Bangalore              3  2016    150.000000
2   Bangalore              3  2017    210.000000
3   Bangalore              4  2015    120.000000
4   Bangalore              4  2016    143.333333
5   Bangalore              4  2017    500.000000
6     Chennai              3  2015    150.000000
7     Chennai              3  2016    150.000000
8     Chennai              3  2017    210.000000
9     Chennai              3  2018    250.000000
10    Chennai              3  2019    280.000000
11    Chennai              4  2016    125.000000
12    Chennai              4  2017    500.000000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, ideas to forecast the rental value for next year will be a great help – Danish Dec 26 '19 at 08:09
  • 1
    @ALI - Not sure, if understand well, but you can check [this](https://stackoverflow.com/questions/55545501/how-to-perform-time-series-analysis-that-contains-multiple-groups-in-python-usin) – jezrael Dec 26 '19 at 08:21
  • accuracy does not matters, code to run each case at once exponential weighted average or something else – Danish Dec 26 '19 at 08:22
  • Thank you so much Jezrael. Will check and try to implement. That is the one I am looking for. But I not sure I can implement that by myself. – Danish Dec 26 '19 at 08:25
  • @ALI - hmmm, I think it should be new question, because machine processing is not easy for me (honestly I specialized for pandas only) – jezrael Dec 26 '19 at 08:26