0

I have the following dataframe:

df

 Index key1 | key2 | key3 | value1 | Value2   
 0    1     | 3    | 4    | 6      |  4 
 1    1     | 3    | 4    | Nan    |  3 
 2    1     | 2    | 3    | 8      |  6
 3    1     | 2    | 3    | Nan    |  5
 4    5     | 7    | 1    | Nan    |  2 

For the value with the same keys (key1, key2, key3), I want to use numeric value and whenever there is no numeric value I want to remove the row. For value2, I simply want the sum.

Desired df

Index key1 | key2 | key3 | value1 | value2    
 0    1    | 3    | 4    | 6      | 7
 2    1    | 2    | 3    | 8      | 11

Retaining the correct index is not important.

The logic here is that there are two groups of (key1, key2, key3) that have non-NaN values in value1. There is a third group where key1, key2, key3 = 5, 7, 1, which I want to drop because there is no non-NaN value for value1 in this group.

Within each group, replace value1 (if it is NaN) with a non-NaN value1 value from the group. Finally, compute value2 to be the sum of all value2 values in the group.

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
user2512443
  • 485
  • 1
  • 6
  • 20
  • so `df.dropna()`? – sammywemmy Mar 09 '21 at 06:32
  • `and whenever there is no numeric value I want to remove the row` - it means need remove NaNs rows by `value` column ? – jezrael Mar 09 '21 at 06:33
  • The title is misleading. OP wants to group the df by a set of columns (namely, the `key*` columns). Within each group, if `value1` is NaN, use the `value1` value from another row within the same group (what happens if there are conflicting values? OP doesn't specify). Then, set `value2` to be the sum of all `value2` values within that group – inspectorG4dget Mar 09 '21 at 16:21

0 Answers0