How do I combine N non-numerical columns while removing null values?

Question

Building on this question Combining columns and removing NaNs Pandas,

I have a dataframe that looks like this:

col     x       y        z

a1      a       NaN      NaN
a2      NaN     b        NaN
a3      NaN     c        NaN
a4      NaN     NaN      d
a5      NaN     e        NaN
a6      f       NaN      NaN
a7      g       NaN      NaN
a8      NaN     NaN      NaN

The cell values are strings and the NaNs are arbitrary null values.

I would like to combine the columns to add a new combined column thus:

col  w

a1   a
a2   b
a3   c
a4   d
a5   e
a6   f
a7   g
a8   NaN

The elegant solution proposed in the question above uses

df['w']=df[['x','y','z']].sum(axis=1)

but sum does not work for non-numerical values.

How, in this case for strings, do I combine the columns into a single column?

You can assume:

Each row only has one of x, y, z that is non-null.
The individual columns must be referenced by name (since they are a subset of all of the available columns in the dataframe).
In general there are N and not just 3 columns in the subset.
Hopefully no use for iloc/for loops :\

Update: (apologies to those who have already given answers :\ )

I have added a final row where every column contains NaN, and I would like the combined row to reflect that. Thanks + sorry!

Thanks as ever for all help

sobek · Accepted Answer · 2018-09-07T07:51:10.287

2

Here is yet another solution:

df['res'] = df.fillna('').sum(1).replace('', np.nan)

The result is

       x    y    z  res
col                    
a1     a  NaN  NaN    a
a2   NaN    b  NaN    b
a3   NaN    c  NaN    c
a4   NaN  NaN    d    d
a5   NaN    e  NaN    e
a6     f  NaN  NaN    f
a7     g  NaN  NaN    g
a8   NaN  NaN  NaN  NaN

edited Sep 07 '18 at 07:51

answered Sep 07 '18 at 07:42

sobek

1,386
10
28

This is amazing and also deals with the update to the question, for which thanks. I have to leave the answers open for a few hours. – jtlz2 Sep 07 '18 at 07:47
Thanks, I see you reverted your answer to accommodate! :) – jtlz2 Sep 07 '18 at 07:47

Space Impact · Answer 2 · 2018-09-07T08:46:59.663

1

I think you need:

s = df[['x','y','z']]
df['w'] = s.values[s.notnull()]
df[['col','w']]

Or After edit of question:

df['w'] = pd.DataFrame(df[['x','y','z']].apply(lambda x: x.values[x.notnull()],axis=1).tolist())
df[['col','w']].fillna(np.nan)

Which gives

    col w
0   a1  a
1   a2  b
2   a3  c
3   a4  d
4   a5  e
5   a6  f
6   a7  g
7   a8  NaN

edited Sep 07 '18 at 08:46

answered Sep 07 '18 at 07:20

Space Impact

13,085
23
48

So sorry - I have updated the question - are you able to reflect the update? :\ – jtlz2 Sep 07 '18 at 07:46
@jtlz2 This numpy vectorized approach for the problem inspired by my previous solution(https://stackoverflow.com/questions/52113568/generate-new-dataframe-without-nan-values/52113743#52113743). – Space Impact Sep 07 '18 at 08:00
Oh gosh - does this mean my Q is a dupe? :\ – jtlz2 Sep 07 '18 at 08:01
Thx so so much! – jtlz2 Sep 07 '18 at 08:05
1

@jtlz2 Updated the solution. – Space Impact Sep 07 '18 at 08:47

pietroppeter · Answer 3 · 2018-09-07T08:05:12.700

0

Instead of generic sum, you have to apply a custom function. This one, for example works on your example:

import numpy as np
f = lambda x: x[x.notnull()][0] if any(x.notnull()) else np.nan
df['w'] = df[list('xyz')].apply(f, axis=1)

edited Sep 07 '18 at 08:05

answered Sep 07 '18 at 07:21

pietroppeter

1,433
13
30

I'm so sorry, I have updated the question - is your answer adaptable? :\ – jtlz2 Sep 07 '18 at 07:46
1

Yes, I did adapt it. – pietroppeter Sep 07 '18 at 08:20
Welcome! I must say anyway that I like the answer of sobek better than mine... :) – pietroppeter Sep 07 '18 at 08:33

How do I combine N non-numerical columns while removing null values?

3 Answers3