Pandas DataFrame: How to quickly get first non-NaN value in each row?

Asked Aug 06 '15 at 16:17

Active Aug 06 '15 at 17:03

Viewed 3,862 times

import pandas
import numpy

list_of_lines = [['a', 'b', 'c', 'd'],
                 ['a', 'b', 'c', 'd'],
                 ['a', 'b', 'c', 'd'],
                 ['a', 'b', 'c', 'd']]

df1 = pandas.DataFrame(list_of_lines)

df1.ix[2, 0] = numpy.NaN
df1.ix[2, 1] = numpy.NaN

Is there a way to get the first non-NaN value from each row?

The way I am currently doing it is:

df1 = df1.bfill(axis = 1)

my_answer = df1.iloc[:, 1]

This gives me:

'a'
'a'
'c'
'a'

But it is very slow.

Is there some trick or native way of doing this without having to bfill? My dataframe is 1,500 columns x 10,000 rows and it takes a while to bfill.

There is a similar question here but that answer is slower than my current method.

edited Aug 06 '15 at 17:03

asked Aug 06 '15 at 16:17

user1367204

4,549
10
49
78

1

I submitted an answer to the original question which may have the performance you are looking for. – JoeCondron Aug 06 '15 at 18:03
1

Thank you. I got a 5x speed-up. Exactly what I needed. – user1367204 Aug 06 '15 at 18:16

Pandas DataFrame: How to quickly get first non-NaN value in each row?

0 Answers0