0

I have several columns in my DataFrame that can contain null values. I need to add a column that will check these specific columns and display first non-null value. If all values in those columns are null, then leave it null.

Example (please note that I only need select columns, original DF has columns that are present but not important for this example).

data = [[None, 20, None],[30, None, 30],[10, None, None]]
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
print(df)

      A     B     C
0   NaN  20.0   NaN
1  30.0   NaN  30.0
2  10.0   NaN   NaN

Below is my attempt that looks up all columns in a row, but I need it only to look at values from specific columns (lets call them A, B, and C from example):

df['D'] = df.bfill(axis=1).iloc[:, 0]
print(df)

      A     B     C     D
0   NaN  20.0   NaN  20.0
1  30.0   NaN  30.0  30.0
2  10.0   NaN   NaN  10.0
detrraxic
  • 156
  • 6

2 Answers2

1

This is small toy example that shows a nice way to do it. In this case "first" and "second" are the columns of interest. You can limit the columns to process by passing your desired columns instead of [["first", "second"]]


df = pd.DataFrame([
    [np.nan, np.nan],
    [np.nan, 2],
    [np.nan, 4]
], columns=["first", "second"])

df[["first", "second"]].apply(lambda series: series[series.first_valid_index()] if series.first_valid_index() else np.nan, axis=0)

output is

first     NaN  # first valid value from first column
second    2.0  # first valid value from second column
dtype: float64

edit: you write column but implemented rows. I provided a column example. I can rewrite it (by changing the axis parameter), if you prefer it that way.

edit2 handles nan only columns

Klops
  • 951
  • 6
  • 18
  • Thanks for the tip. I tried with axis=1 and it seems to produce good results however when I have a row with all NaN values it crashes. Is it possible to implement logic that when all col1, col2 values are NaN, that resulting column is also NaN? – detrraxic Aug 01 '23 at 14:41
  • sure, I made a small change to the answer – Klops Aug 01 '23 at 15:13
1

As what you have tried, you can preselect the columns and pass that in your bfill.

>>> df['D'] = df[['A', 'B', 'C']].bfill(axis=1).iloc[:, 0]
>>> df
      A     B     C     D
0   NaN  20.0   NaN  20.0
1  30.0   NaN  30.0  30.0
2  10.0   NaN   NaN  10.0

This will perform bfill across the selected columns only.

oim
  • 1,141
  • 10
  • 14