0

I have dataframe like this:

   c1   c2   c3
0   a   NaN  NaN
1  NaN   b   NaN
2  NaN  NaN   c
3  NaN   b   NaN
4   a   NaN  NaN

I want to combine these three columns like this :

    c4
0    a
1    b
2    c
3    b
4    a

Here is the code to make the above data frame:

a = pd.DataFrame({
    'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
    'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
    'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})
luckyCasualGuy
  • 641
  • 1
  • 5
  • 15
  • 1
    Try `a.bfill(axis=1).iloc[:,0] ` – cs95 Jul 06 '20 at 07:34
  • @cs95 I saw that ans too man and this is not exactly same !!! (∩︵∩) !!! – luckyCasualGuy Jul 06 '20 at 07:54
  • 1
    Unfortunately it sort of is. Both questions call for collapsing non-null values into a single column. I didn't think to look for a duplicate until I started looking for the source to [Divakar's justify code](https://stackoverflow.com/questions/44558215/python-justifying-numpy-array/44559180#44559180). – cs95 Jul 06 '20 at 07:56
  • 1
    Nothing wrong with marking this question as duplicate - it is a _good_ thing, you are acting as a guidepost to other, more standard resources on the site. I am not "flagging" you, this is a privilege I am using as a member with more experience on the site :-) – cs95 Jul 06 '20 at 08:03
  • 1
    Additionally, you've already received answers to your _own_ question, which doesn't always happen for a question marked duplicate, so that's a good thing! Did either of those answers work for you? – cs95 Jul 06 '20 at 08:04
  • Others cannot provide me with more options if its marked duplicate – luckyCasualGuy Jul 06 '20 at 08:05
  • 1
    If you have any issue with the existing answers, please leave a comment and we'd get back ASAP. As for _new_ options, please take it from me they'll rehash the answers in [this link](https://stackoverflow.com/questions/56583174/how-to-collapse-columns-in-pandas-on-null-values), so I see no point in reopening. Last word on this. – cs95 Jul 06 '20 at 08:06

2 Answers2

4

You could try this:

import pandas as pd
import numpy as np
a = pd.DataFrame({
    'c1': ['a',np.NaN,np.NaN,np.NaN,'a'],
    'c2': [np.NaN,'b',np.NaN,'b',np.NaN],
    'c3': [np.NaN,np.NaN,'c',np.NaN,np.NaN]
})

newdf=pd.DataFrame({'c4':a.fillna('').values.sum(axis=1)})

Output:

newdf

  c4
0  a
1  b
2  c
3  b
4  a

I just see this option retrieved from jpp's answer, where jpp take advantage of the fact that np.nan != np.nan and uses a list comprehension, maybe it could be the fastest way:

newdf=pd.DataFrame({'c4':[i  for row in a.values for i in row if i == i]})
print(newdf)
MrNobody33
  • 6,413
  • 7
  • 19
4

bfilling is one option:

a.bfill(axis=1).iloc[:,0]

0    a
1    b
2    c
3    b
4    a
Name: c1, dtype: object

Another one is a simple stack, gets rid of NaNs.

a.stack().reset_index(level=1, drop=True) 


0    a
1    b
2    c
3    b
4    a
dtype: object

Another interesting option you don't see everyday is using the power of NumPy. Here's a modified version of Divakar's justify utility that works with object DataFrames.

justify(a.to_numpy(), invalid_val=np.nan)[:,0]
# array(['a', 'b', 'c', 'b', 'a'], dtype=object)

# as a Series
pd.Series(justify(a.to_numpy(), invalid_val=np.nan)[:,0], index=a.index)

0    a
1    b
2    c
3    b
4    a
dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746