Suppose I have the following df:
df=pd.DataFrame({'g1':['a','b','c'],
'g2':['x','y','z'],
'val':[1,2,3]})
df
g1 g2 val
a x 1
b y 2
c z 3
whose 'primary key' (using SQL's jargon here) is g1
and g2
.
How do I expand the dataframe such that it has a row for every combination of the existing values in g1
and g2
. For the rows that are added, leaving np.nan under the val column is fine. So the result will look like the following:
g1 g2 val
a x 1.0
a y NaN
a z NaN
b x NaN
b y 2.0
b z NaN
c x NaN
c y NaN
c z 3.0
I am currently doing the following, but I am sure it is not the best solution:
df.set_index(['g1','g2']).T.stack().unstack().T.reset_index()
I can also do merges, but when the number of keys is large the code looks cumbersome.