-1

Python 3 / Pandas

I am trying to use a function to check the values of various columns in a dataframe, and select only the value from the column that is not NaN.

The data is structured so there is one main column df['C1'] that I want to populate based on the value in one of the next four columns, df['C2'], df['C3'], df['C4'] and df['C5']. When I observe the data, I see that in the rows df['C2'], df['C3'], df['C4'] and df['C5'], every column has a value that is NaN except for one column which has a text value. This is true for all rows in the dataframe. I am trying to write a function that will be applied to the dataframe to find the column which has a text value, and copy that value from the column into df['C1'].

Here is the function I wrote:

def get_component(df):
    if ~df['C2'].isna():
        return df['C2']
    elif ~df['C3'].isna():
        return df['C3']
    elif ~df['C4'].isna():
        return df['C4']
    elif ~df['C5'].isna():
        return df['C5']
df['C1'] = df.apply(get_component, axis=1)

But I get the following error:

AttributeError: ("'float' object has no attribute 'isna'", 'occurred at index 0')

Any ideas on how to fix this error so I can achieve this objective? Is there another method to achieve the same result?

Thanks for the help!

  • Welcome to stack overflow! Please [edit] to include samples of your input data and desired output to make a [mcve] so we can better understand your problem. See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for more help – G. Anderson Mar 04 '20 at 23:23
  • Does this answer your question? [First non-null value per row from a list of Pandas columns](https://stackoverflow.com/questions/31828240/first-non-null-value-per-row-from-a-list-of-pandas-columns) – AMC Mar 05 '20 at 01:22

2 Answers2

0

Nevermind, I figured it just stumbled upon np.where and used the following code to solve the problem:

df['C1'] = np.where(~df['C2'].isna(),df['C2'],
                          np.where(~df['C3'].isna(),df['C3'],
                                  np.where(~df['C4'].isna(),df['C4'],
                                          np.where(~df['C5'].isna(),df['C5'],None))))
0

A solution that makes use of pandas' stack method:

import pandas as pd
import numpy as np

# Initialize example dataframe
df = pd.DataFrame({
    "C2": [np.nan, 3, np.nan, np.nan, np.nan],
    "C3": [5, np.nan, np.nan, np.nan, np.nan],
    "C4": [np.nan, np.nan, np.nan, 7, 3],
    "C5": [np.nan, np.nan, 2, np.nan, np.nan],
})

df["C1"] = df.stack().to_numpy()
print(df)
# Output:
#     C2   C3   C4   C5   C1
# 0  NaN  5.0  NaN  NaN  5.0
# 1  3.0  NaN  NaN  NaN  3.0
# 2  NaN  NaN  NaN  2.0  2.0
# 3  NaN  NaN  7.0  NaN  7.0
# 4  NaN  NaN  3.0  NaN  3.0
Xukrao
  • 8,003
  • 5
  • 26
  • 52