I have the following dataframe:
df
Index key1 | key2 | key3 | value1 | Value2
0 1 | 3 | 4 | 6 | 4
1 1 | 3 | 4 | Nan | 3
2 1 | 2 | 3 | 8 | 6
3 1 | 2 | 3 | Nan | 5
4 5 | 7 | 1 | Nan | 2
For the value with the same keys (key1, key2, key3), I want to use numeric value and whenever there is no numeric value I want to remove the row. For value2, I simply want the sum.
Desired df
Index key1 | key2 | key3 | value1 | value2
0 1 | 3 | 4 | 6 | 7
2 1 | 2 | 3 | 8 | 11
Retaining the correct index is not important.
The logic here is that there are two groups of (key1, key2, key3)
that have non-NaN values in value1
. There is a third group where key1, key2, key3 = 5, 7, 1
, which I want to drop because there is no non-NaN value for value1
in this group.
Within each group, replace value1
(if it is NaN) with a non-NaN value1
value from the group. Finally, compute value2
to be the sum of all value2
values in the group.