I have a very simple dataframe in Pandas,
testdf = [{'name' : 'id1', 'W': np.NaN, 'L': 0, 'D':0},
{'name' : 'id2', 'W': 0, 'L': np.NaN, 'D':0},
{'name' : 'id3', 'W': np.NaN, 'L': 10, 'D':0},
{'name' : 'id4', 'W': 75, 'L': 20, 'D':0}
]
testdf = pd.DataFrame(testdf)
testdf = testdf[['name', 'W', 'L', 'D']]
which looks like this:
| name | W | L | D |
|------|-----|-----|---|
| id1 | NaN | 0 | 0 |
| id2 | 0 | NaN | 0 |
| id3 | NaN | 10 | 0 |
| id4 | 75 | 20 | 0 |
My goal is simple:
1) I want to impute all the missing values by simply replacing them with a 0.
2) Next I want to create indicator columns with a 0 or 1 to indicate that the new value (the 0) is indeed created by the imputation process.
It's probably easier to just show instead of explain with words:
| name | W | W_indicator | L | L_indicator | D | D_indicator |
|------|----|-------------|----|-------------|---|-------------|
| id1 | 0 | 1 | 0 | 0 | 0 | 0 |
| id2 | 0 | 0 | 0 | 1 | 0 | 0 |
| id3 | 0 | 1 | 10 | 0 | 0 | 0 |
| id4 | 75 | 0 | 20 | 0 | 0 | 0 |
My attempts have failed, since I get stuck trying to change all non-NaN values to some placeholder value, then change all NaNs to a 0, then change back the placeholder value to NaN, etc etc. It gets messy so fast. Then I keep getting all kinds of slice warnings. And the masks get all jumbled. I'm sure there's a much more elegant way to do this than my wonky heuristical methods.