0

I have a two columns in df, sometimes it has NaN in either one column, sometimes in both columns. I want to fill NaN with same value if any one of the columns values present.

For ex, Input:

       col1      col2
0  3.375000  4.075000
1  2.450000  1.567100
2       NaN       NaN
3  3.248083       NaN
4       NaN  2.335725
5  2.150000  3.218750

Output:

       col1      col2
0  3.375000  4.075000
1  2.450000  1.567100
2       NaN       NaN
3  3.248083  3.248083
4  2.335725  2.335725
5  2.150000  3.218750

For this I tried,

print df.T.fillna(method='bfill').fillna(method='ffill').T

The above give me a required result, But I think I'm adding more complexity to my code. Is there any other better approach for this?

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
  • `df.ffill(1).bfill(1)`? – cs95 Dec 18 '18 at 07:01
  • Possible duplicate of [Coalesce values from 2 columns into a single column in a pandas dataframe](https://stackoverflow.com/questions/38152389/coalesce-values-from-2-columns-into-a-single-column-in-a-pandas-dataframe) – Trenton McKinney Oct 26 '19 at 19:21

1 Answers1

4

You don't have to transpose, you can specify an axis:

df.ffill(1).bfill(1)

       col1      col2
0  3.375000  4.075000
1  2.450000  1.567100
2       NaN       NaN
3  3.248083  3.248083
4  2.335725  2.335725
5  2.150000  3.218750

If you have multiple columns, but don't want to touch some of them, you can slice, fill, and assign back.

df
       col1      col2  col3
0  3.375000  4.075000   NaN
1  2.450000  1.567100   2.0
2       NaN       NaN   3.0
3  3.248083       NaN   5.0
4       NaN  2.335725   NaN
5  2.150000  3.218750   5.0

include = ['col1', 'col2']
# Or,
# exclude = ['col3']
# include = df.columns.difference(exclude)
df[include] = df[include].ffill(1).bfill(1)

df

       col1      col2  col3
0  3.375000  4.075000   NaN
1  2.450000  1.567100   2.0
2       NaN       NaN   3.0
3  3.248083  3.248083   5.0
4  2.335725  2.335725   NaN
5  2.150000  3.218750   5.0

If there are only two columns, you can also use combine_first.

df.col1 = df.col1.combine_first(df.col2) 
df.col2 = df.col2.combine_first(df.col1)

       col1      col2
0  3.375000  4.075000
1  2.450000  1.567100
2       NaN       NaN
3  3.248083  3.248083
4  2.335725  2.335725
5  2.150000  3.218750
cs95
  • 379,657
  • 97
  • 704
  • 746