3

Building on this question Combining columns and removing NaNs Pandas,

I have a dataframe that looks like this:

col     x       y        z

a1      a       NaN      NaN
a2      NaN     b        NaN
a3      NaN     c        NaN
a4      NaN     NaN      d
a5      NaN     e        NaN
a6      f       NaN      NaN
a7      g       NaN      NaN
a8      NaN     NaN      NaN

The cell values are strings and the NaNs are arbitrary null values.

I would like to combine the columns to add a new combined column thus:

col  w

a1   a
a2   b
a3   c
a4   d
a5   e
a6   f
a7   g
a8   NaN

The elegant solution proposed in the question above uses

df['w']=df[['x','y','z']].sum(axis=1)

but sum does not work for non-numerical values.

How, in this case for strings, do I combine the columns into a single column?

You can assume:

  1. Each row only has one of x, y, z that is non-null.
  2. The individual columns must be referenced by name (since they are a subset of all of the available columns in the dataframe).
  3. In general there are N and not just 3 columns in the subset.
  4. Hopefully no use for iloc/for loops :\

Update: (apologies to those who have already given answers :\ )

  1. I have added a final row where every column contains NaN, and I would like the combined row to reflect that. Thanks + sorry!

Thanks as ever for all help

jtlz2
  • 7,700
  • 9
  • 64
  • 114

3 Answers3

2

Here is yet another solution:

df['res'] = df.fillna('').sum(1).replace('', np.nan)

The result is

       x    y    z  res
col                    
a1     a  NaN  NaN    a
a2   NaN    b  NaN    b
a3   NaN    c  NaN    c
a4   NaN  NaN    d    d
a5   NaN    e  NaN    e
a6     f  NaN  NaN    f
a7     g  NaN  NaN    g
a8   NaN  NaN  NaN  NaN
sobek
  • 1,386
  • 10
  • 28
  • This is amazing and also deals with the update to the question, for which thanks. I have to leave the answers open for a few hours. – jtlz2 Sep 07 '18 at 07:47
  • Thanks, I see you reverted your answer to accommodate! :) – jtlz2 Sep 07 '18 at 07:47
1

I think you need:

s = df[['x','y','z']]
df['w'] = s.values[s.notnull()]
df[['col','w']]

Or After edit of question:

df['w'] = pd.DataFrame(df[['x','y','z']].apply(lambda x: x.values[x.notnull()],axis=1).tolist())
df[['col','w']].fillna(np.nan)

Which gives

    col w
0   a1  a
1   a2  b
2   a3  c
3   a4  d
4   a5  e
5   a6  f
6   a7  g
7   a8  NaN
Space Impact
  • 13,085
  • 23
  • 48
0

Instead of generic sum, you have to apply a custom function. This one, for example works on your example:

import numpy as np
f = lambda x: x[x.notnull()][0] if any(x.notnull()) else np.nan
df['w'] = df[list('xyz')].apply(f, axis=1)
pietroppeter
  • 1,433
  • 13
  • 30