How to remove nan value while combining two column in Panda Data frame?

Question

I am trying but not able to remove nan while combining two columns of a DataFrame.

Data is like:

feedback_id                  _id
568a8c25cac4991645c287ac     nan    
568df45b177e30c6487d3603     nan    
nan                          568df434832b090048f34974       
nan                          568cd22e9e82dfc166d7dff1   
568df3f0832b090048f34711     nan
nan                          568e5a38b4a797c664143dda

I want:

feedback_request_id
568a8c25cac4991645c287ac
568df45b177e30c6487d3603
568df434832b090048f34974
568cd22e9e82dfc166d7dff1
568df3f0832b090048f34711
568e5a38b4a797c664143dda

Here is my code:

df3['feedback_request_id'] = ('' if df3['_id'].empty else df3['_id'].map(str)) + ('' if df3['feedback_id'].empty else df3['feedback_id'].map(str))

Output I'm getting:

feedback_request_id
568a8c25cac4991645c287acnan
568df45b177e30c6487d3603nan
nan568df434832b090048f34974
nan568cd22e9e82dfc166d7dff1
568df3f0832b090048f34711nan
nan568e5a38b4a797c664143dda

I have tried this, also:

df3['feedback_request_id'] = ('' if df3['_id']=='nan' else df3['_id'].map(str)) + ('' if df3['feedback_id']=='nan' else df3['feedback_id'].map(str))

But it's giving the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

score 109 · Accepted Answer · answered Jan 25 '16 at 09:44

You can use combine_first or fillna:

print df['feedback_id'].combine_first(df['_id'])
0    568a8c25cac4991645c287ac
1    568df45b177e30c6487d3603
2    568df434832b090048f34974
3    568cd22e9e82dfc166d7dff1
4    568df3f0832b090048f34711
5    568e5a38b4a797c664143dda
Name: feedback_id, dtype: object

print df['feedback_id'].fillna(df['_id'])
0    568a8c25cac4991645c287ac
1    568df45b177e30c6487d3603
2    568df434832b090048f34974
3    568cd22e9e82dfc166d7dff1
4    568df3f0832b090048f34711
5    568e5a38b4a797c664143dda
Name: feedback_id, dtype: object

This will only combine two columns, `bfill` is a better option — Diomedea, Jul 27 '21 at 13:41

BallpointBen · Answer 2 · 2018-04-18T20:47:09.810

14

If you want a solution that doesn't require referencing df twice or any of its columns explicitly:

df.bfill(axis=1).iloc[:, 0]

With two columns, this will copy non-null values from the right column into the left, then select the left column.

edited Apr 18 '18 at 20:47

answered Apr 18 '18 at 20:39

BallpointBen

9,406
1
32
62

Will this replace nan values in the column being copied to? – ZakS Sep 16 '19 at 07:25
This is excellent - it will handle multiple columns:

` COLUMN BEAM BRACE LINE`
` C4 (3651, 3651, 1) NaN NaN NaN`
` B58 NaN (4367, 5285, 0) NaN NaN`
` D1027 NaN NaN (4311, 4310, 1) NaN`
` L11 NaN NaN NaN (3686, 3677, 1)`

->
` C4 (3651, 3651, 1)`
` B58 (4367, 5285, 0)`
` D1027 (4311, 4310, 1)`
` L11 (3686, 3677, 1)`
` Name: COLUMN, dtype: object`
Execrable editing options... – Diomedea Jul 27 '21 at 13:37

score 13 · Answer 3 · answered Oct 02 '18 at 10:50

For an in-place solution, you can use pd.Series.update with pd.DataFrame.pop:

df['feedback_id'].update(df.pop('_id'))

print(df)

                feedback_id
0  568a8c25cac4991645c287ac
1  568df45b177e30c6487d3603
2  568df434832b090048f34974
3  568cd22e9e82dfc166d7dff1
4  568df3f0832b090048f34711
5  568e5a38b4a797c664143dda

score 3 · Answer 4 · answered Sep 18 '21 at 01:11

3

below should works, if not, check with the null in your columns are np.nan or pd.NaT, only pd.NaT will work

df[['col1','col2']].bfill(axis=1).iloc[:, 0]

answered Sep 18 '21 at 01:11

PyBoss

599
1
7
20

Gonçalo Peres · Answer 5 · 2022-10-05T09:53:47.187

Considering OP's original dataframe df, the final solution would depend on the desired output type:

Dataframe (Size (6,1))
Series (Size (6,))

Option 1

Assuming one wants the final output to be a dataframe , one can create a dataframe after concatenating _id to feedback_id, then dropping the NaNs, renaming the column to feedback_request_id, and reseting the index, as follows

df_new = pd.DataFrame(pd.concat([df['feedback_id'], df['_id']]).dropna(), columns=['feedback_request_id']).reset_index(drop=True)

[Out]:
        feedback_request_id
0  568a8c25cac4991645c287ac
1  568df45b177e30c6487d3603
2  568df3f0832b090048f34711
3  568df434832b090048f34974
4  568cd22e9e82dfc166d7dff1
5  568e5a38b4a797c664143dda

or using pandas.Series.to_frame as follows

df_new = pd.concat([df['feedback_id'], df['_id']]).dropna().to_frame(name='feedback_request_id').reset_index(drop=True)

[Out]:
        feedback_request_id
0  568a8c25cac4991645c287ac
1  568df45b177e30c6487d3603
2  568df3f0832b090048f34711
3  568df434832b090048f34974
4  568cd22e9e82dfc166d7dff1
5  568e5a38b4a797c664143dda

Option 2

Alternatively, if one wants the output to be a Series, the following would be enough

df_new = pd.concat([df['feedback_id'], df['_id']]).dropna().reset_index(drop=True)


[Out]:
0    568a8c25cac4991645c287ac
1    568df45b177e30c6487d3603
2    568df3f0832b090048f34711
3    568df434832b090048f34974
4    568cd22e9e82dfc166d7dff1
5    568e5a38b4a797c664143dda

There are other ways to do that. Another one is using pandas.Series.where and pandas.notnull as follows

df_new = df['feedback_id'].where(df['feedback_id'].notnull(), df['_id'])

[Out]:

0    568a8c25cac4991645c287ac
1    568df45b177e30c6487d3603
2    568df434832b090048f34974
3    568cd22e9e82dfc166d7dff1
4    568df3f0832b090048f34711
5    568e5a38b4a797c664143dda

Another one is using pandas.Series.combine with a custom lambda function with pandas.notnull

df_new = df['feedback_id'].combine(df['_id'], lambda x, y: x if pd.notnull(x) else y))

[Out]:
0    568a8c25cac4991645c287ac
1    568df45b177e30c6487d3603
2    568df434832b090048f34974
3    568cd22e9e82dfc166d7dff1
4    568df3f0832b090048f34711
5    568e5a38b4a797c664143dda

How to remove nan value while combining two column in Panda Data frame?

5 Answers5

Linked

Related