I have a wide data frame with several years:
df = pd.DataFrame(index=pd.Index([29925, 223725, 280165, 813285, 956765], name='ID'),
columns=pd.Index([1991, 1992, 1993, 1994, 1995, 1996, '2010-2012'], name='Year'),
data = np.array([[np.NaN, np.NaN, 16, 17, 18, 19, np.NaN],
[16, 17, 18, 19, 20, 21, np.NaN],
[np.NaN, np.NaN, np.NaN, np.NaN, 16, 17, 31],
[np.NaN, 22, 23, 24, np.NaN, 26, np.NaN],
[36, 36, 37, 38, 39, 40, 55]]))
Year 1991 1992 1993 1994 1995 1996 2010-2012
ID
29925 NaN NaN 16.0 17.0 18.0 19.0 NaN
223725 16.0 17.0 18.0 19.0 20.0 21.0 NaN
280165 NaN NaN NaN NaN 16.0 17.0 31.0
813285 NaN 22.0 23.0 24.0 NaN 26.0 NaN
956765 36.0 36.0 37.0 38.0 39.0 40.0 55.0
The values in each row are the age of each person, with each holding a unique ID. I want to fill the NaN
of this data frame in each year of every row, based on the existing age values in each row.
For example, ID 29925
is 16 in 1993
, we know they are 15 in 1992
and 14 in 1991
, therefore we want to replace the NaN
for 29925
in the columns 1992
and 1991
. Similarly, I want to replace the NaN
in the column2010-2012
based on the existing age values for 29925
. Let's assume that 29925
is 15 years older from 1996
in the 2010-2012
column. What is the fastest way to do this for the whole data frame - i.e for all IDs?