I have the following pd.DataFrame representing constraints of an optimisation problem.
FEATURE COLOR CLASS CONSTRAINTS
0 1.0 NaN NaN 0.20
1 3.0 NaN NaN 0.20
2 1.0 1.0 NaN 0.15
3 1.0 NaN b -0.05
4 1.0 1.0 a -0.07
5 1.0 1.0 b -0.10
6 3.0 1.0 NaN 0.10
7 NaN NaN NaN 0.20
Here FEATURE
represents a categorical variable with possible values [1,2,3]
, COLOR
represents a categorical variable with possible values [1,2]
and CLASS
is another categorical variable with possible values [a,b,c,d]
.
Missing values here have the meaning "all other values". In this sense the dataframe is a compressed version of a larger dataframe encompassing all or some of the combinations of the columns categories.
What I would like to do here is to "expand" the NaN values to all possible values each individual column can represent.
For example row 0 would expand to 8 total rows, being the product of "free" features, namely COLOR
with possible values [1,2]
and CLASS with possible values `[a,b,c,d]``
new FEATURE COLOR CLASS CONSTRAINTS
0 1 1 a 0.2
1 1 1 b 0.2
2 1 2 a 0.2
3 1 2 b 0.2
4 1 1 a 0.2
5 1 1 b 0.2
6 1 2 a 0.2
7 1 2 b 0.2
How can I efficiently perform this transformation in Pandas?