Idea is create MultiIndex
by DataFrame.set_index
and DataFrame.reindex
by MultiIndex
created by MultiIndex.from_product
:
data = {'id': [100, 100, 100, 100, 100, 200, 200],
'year': [2010, 2013, 2014, 2015, 2016, 2010, 2012],
'value': [3000, 1000, 2000, 1200, 1300, 2000, 1500]}
df = pd.DataFrame(data)
#you can specify minimal and maximal year by scalar
mux = pd.MultiIndex.from_product([df['id'].unique(),
np.arange(df['year'].min(), 2019)],
names=['id','year'])
df1 = df.set_index(['id','year']).reindex(mux, fill_value=0).reset_index()
print (df1)
id year value
0 100 2010 3000
1 100 2011 0
2 100 2012 0
3 100 2013 1000
4 100 2014 2000
5 100 2015 1200
6 100 2016 1300
7 100 2017 0
8 100 2018 0
9 200 2010 2000
10 200 2011 0
11 200 2012 1500
12 200 2013 0
13 200 2014 0
14 200 2015 0
15 200 2016 0
16 200 2017 0
17 200 2018 0
#you can specify minimal and maximal year by min and max functions
mux = pd.MultiIndex.from_product([df['id'].unique(),
np.arange(df['year'].min(), df['year'].max() + 1)],
names=['id','year'])
df2 = df.set_index(['id','year']).reindex(mux, fill_value=0).reset_index()
print (df2)
id year value
0 100 2010 3000
1 100 2011 0
2 100 2012 0
3 100 2013 1000
4 100 2014 2000
5 100 2015 1200
6 100 2016 1300
7 200 2010 2000
8 200 2011 0
9 200 2012 1500
10 200 2013 0
11 200 2014 0
12 200 2015 0
13 200 2016 0