Pandas - find first non-null value in column

Question

If I have a series that has either NULL or some non-null value. How can I find the 1st row where the value is not NULL so I can report back the datatype to the user. If the value is non-null all values are the same datatype in that series.

Dupe: http://stackoverflow.com/questions/23309514/computing-the-first-non-missing-value-from-each-column-in-a-dataframe — EdChum, Feb 09 '17 at 13:13

score 66 · Accepted Answer · edited Aug 11 '18 at 13:26

66

You can use first_valid_index with select by loc:

s = pd.Series([np.nan,2,np.nan])
print (s)
0    NaN
1    2.0
2    NaN
dtype: float64

print (s.first_valid_index())
1

print (s.loc[s.first_valid_index()])
2.0

# If your Series contains ALL NaNs, you'll need to check as follows:

s = pd.Series([np.nan, np.nan, np.nan])
idx = s.first_valid_index()  # Will return None
first_valid_value = s.loc[idx] if idx is not None else None
print(first_valid_value)
None

edited Aug 11 '18 at 13:26

The Aelfinn

13,649
2
54
45

answered Feb 09 '17 at 13:06

jezrael

822,522
95
1,334
1,252

If the series contains duplicate index values `s.loc[idx]` would actually return a series. @jezrael do you think there's a good general solution that will also work in that case or is conditioning on the type of `first_valid_value` inevitable? – stav May 19 '20 at 14:50
@Stav - Not easy question, maybe the best post new question. – jezrael May 19 '20 at 14:52
@jezrael For the last valid value, would you just reverse the series and use the same fn? – jtlz2 Feb 02 '23 at 10:42
1

@jtlz2 - use `last_valid_index` – jezrael Feb 02 '23 at 10:44

PdevG · Answer 2 · 2017-02-09T13:51:39.347

For a series this will return the first no null value:

Creating Series s:

s = pd.Series(index=[2,4,5,6], data=[None, None, 2, None])

which creates this Series:

2    NaN
4    NaN
5    2.0
6    NaN
dtype: float64

You can get the first non-NaN value by using:

s.loc[~s.isnull()].iloc[0]

which returns

2.0

If you on the other hand have a dataframe like this one:

df = pd.DataFrame(index=[2,4,5,6], data=np.asarray([[None, None, 2, None], [1, None, 3, 4]]).transpose(), 
                  columns=['a', 'b'])

which looks like this:

    a       b
2   None    1
4   None    None
5   2       3
6   None    4

you can select per column the first non null value using this (for column a):

df.a.loc[~df.a.isnull()].iloc[0]

or if you want the first row containing no Null values anywhere you can use:

df.loc[~df.isnull().sum(1).astype(bool)].iloc[0]

Which returns:

a    2
b    3
Name: 5, dtype: object

Danil · Answer 3 · 2019-11-03T15:35:24.003

5

You can also use get method instead

(Pdb) type(audio_col)
<class 'pandas.core.series.Series'>
(Pdb) audio_col.first_valid_index()
19
(Pdb) audio_col.get(first_audio_idx)
'first-not-nan-value.ogg'

edited Nov 03 '19 at 15:35

answered Nov 03 '19 at 11:56

Danil

4,781
1
35
50

Pandas - find first non-null value in column

3 Answers3

Linked