I have data recorded in the format as below,
Input
name year value
Afghanistan 1800 68
Albania 1800 23
Algeria 1800 54
Afghanistan 1801 59
Albania 1801 38
Algeria 1801 72
---
Afghanistan 2040 142
Albania 2040 165
Algeria 2040 120
I would like to resample all of my data which is recorded for years 1800 to 2040 using 1 month and exactly use the format as shown below,
Expected output
name year value
Afghanistan Jan 1800 5.6667
Afghanistan Feb 1800 11.3333
Afghanistan Mar 1800 17.0000
Afghanistan Apr 1800 22.6667
Afghanistan May 1800 28.3333
Afghanistan Jun 1800 34.0000
Afghanistan Jul 1800 39.6667
Afghanistan Aug 1800 45.3333
Afghanistan Sep 1800 51.0000
Afghanistan Oct 1800 56.6667
Afghanistan Nov 1800 62.3333
Afghanistan Dec 1800 68.0000
Albania Jan 1800 1.9167
Albania Feb 1800 3.8333
Albania Mar 1800 5.7500
Albania Apr 1800 7.6667
Albania May 1800 9.5833
Albania Jun 1800 11.5000
Albania Jul 1800 13.4167
Albania Aug 1800 15.3333
Albania Sep 1800 17.2500
Albania Oct 1800 19.1667
Albania Nov 1800 21.0833
Albania Dec 1800 23.0000
Algeria Jan 1800 4.5000
Algeria Feb 1800 9.0000
Algeria Mar 1800 13.5000
Algeria Apr 1800 18.0000
Algeria May 1800 22.5000
Algeria Jun 1800 27.0000
Algeria Jul 1800 31.5000
Algeria Aug 1800 36.0000
Algeria Sep 1800 40.5000
Algeria Oct 1800 45.0000
Algeria Nov 1800 49.5000
Algeria Dec 1800 54.000
I would like my data to look as above for all of the years, i.e from 1800 - 2040. The value column is interpolated. NB: My model will accept months as abbreviations like above.
My closest trial is as below but did not produce the expected result.
data['year'] = pd.to_datetime(data.year, format='%Y')
data.head(3)
name year value
Afghanistan 1800-01-01 00:00:00 68
Albania 1800-01-01 00:00:00 23
Algeria 1800-01-01 00:00:00 54
resampled = (data.groupby(['name']).apply(lambda x: x.set_index('year').resample('M').interpolate()))
resampled.head(3)
name year name value
Afghanistan 1800-01-31 00:00:00 NaN NaN
1800-02-28 00:00:00 NaN NaN
1800-03-31 00:00:00 NaN NaN
Your thoughts will save me here.