Set a pandas column Boolean value based on other columns in the row

Question

Assume a DataFrame

    C1      C2      C3
1   NaN     NaN     NaN
2   20.1    15      200
3   NaN     12      100
4   22.5    8       80

I want to create a new column based on a summarizing boolean of the rest of the row. For example, are any of the values NaN? In that case, my new column value would be "False" for that row.

Or, perhaps, are ALL of the values NaN? In that case, I might want the new column to say False but otherwise True (we do have some values)

I considered using df.notnan() to create a Boolean DataFrame,

    C1      C2      C3
1   False   False   False
2   True    True    True
3   False   True    True
4   True    True    True

I'm sure I'm just missing something simple, but I could not come up with a way to create the fourth column based on OR-ing the existing items in each row.

Also, a generic solution would be nice, one that doesn't require building an interim DF of Booleans.

Background: I have a dataset. Nutrient values are only sampled occasionally, so many of the rows do not contain those values. I would like to have a "Nutrients Sampled" column where the value is True or False based on whether I can expect to see any nutrient sample data in this record. There are 6 possible nutrients and I don't want to check all 6 columns.

I can write the code that checks all 6 columns; I just can't seem to create a new column with a truth value.

This may help. [https://stackoverflow.com/questions/43424199/display-rows-with-one-or-more-nan-values-in-pandas-dataframe](https://stackoverflow.com/questions/43424199/display-rows-with-one-or-more-nan-values-in-pandas-dataframe) — pe-perry, Aug 22 '19 at 01:59

score 2 · Answer 1 · answered Aug 22 '19 at 02:01

2

You can do that using any and all methods which are available on the data frame, just have to pass the argument axis=1 to operate along

example:

df['C4'] = pd.notnull(df).any(axis=1)

     C1    C2     C3     C4
0   NaN   NaN    NaN  False
1  20.1  15.0  200.0   True
2   NaN  12.0  100.0   True
3  22.5   8.0   80.0   True

answered Aug 22 '19 at 02:01

Haleemur Ali

26,718
5
61
85

But how would I do this if I only care about C2 and C3? – Vicki B Aug 23 '19 at 17:18
you can make a list of columns you care about, e.g. `mycols=['C2', 'C3']`, and then subset apply the statement on that subset `df['C4'] = pd.notnull(df[mycols]).any(1)` – Haleemur Ali Aug 23 '19 at 18:12

score 2 · Answer 2 · answered Aug 22 '19 at 02:05

2

I feel like we should using all

df['New']=~df.isna().all(1)
df
     C1    C2     C3    New
1   NaN   NaN    NaN  False
2  20.1  15.0  200.0   True
3   NaN  12.0  100.0   True
4  22.5   8.0   80.0   True

answered Aug 22 '19 at 02:05

BENY

317,841
20
164
234

stahamtan · Answer 3 · 2019-08-22T02:19:21.077

You can use apply method and define a function to map rows to a boolean.

Here is a function, you can customize based on your need (e.g. you can use all instead of any):

# if at least one of the values is NaN
def my_function(row):
    return any(row[['C1', 'C2', 'C3']].isna())

And here is how to apply it to your dataframe and add new column:

df['new_column'] = df.apply(my_function, axis=1)

    C1      C2      C3      new_column
0   NaN     NaN     NaN     True
1   20.1    15.0    200.0   False
2   NaN     12.0    100.0   True
3   22.5    8.0     80.0    False

score 0 · Answer 4 · answered Aug 22 '19 at 01:59

0

How about:

# interim df
df = {"C1": [False, True, False, True], ...
df ["C4"] = df.apply(lambda x: x.C1 or x.C2 or X.C3, axis=1)

Or ... directly as

original_df["C4"] = original_df.apply(lambda x: np.any(np.isnan(x)), axis = 1)

Regards,

answered Aug 22 '19 at 01:59

Edward Aung

3,014
1
12
15

Set a pandas column Boolean value based on other columns in the row

4 Answers4