If I have a series that has either NULL or some non-null value. How can I find the 1st row where the value is not NULL so I can report back the datatype to the user. If the value is non-null all values are the same datatype in that series.
Asked
Active
Viewed 5.4k times
47
-
2Dupe: http://stackoverflow.com/questions/23309514/computing-the-first-non-missing-value-from-each-column-in-a-dataframe – EdChum Feb 09 '17 at 13:13
3 Answers
66
You can use first_valid_index
with select by loc
:
s = pd.Series([np.nan,2,np.nan])
print (s)
0 NaN
1 2.0
2 NaN
dtype: float64
print (s.first_valid_index())
1
print (s.loc[s.first_valid_index()])
2.0
# If your Series contains ALL NaNs, you'll need to check as follows:
s = pd.Series([np.nan, np.nan, np.nan])
idx = s.first_valid_index() # Will return None
first_valid_value = s.loc[idx] if idx is not None else None
print(first_valid_value)
None

The Aelfinn
- 13,649
- 2
- 54
- 45

jezrael
- 822,522
- 95
- 1,334
- 1,252
-
If the series contains duplicate index values `s.loc[idx]` would actually return a series. @jezrael do you think there's a good general solution that will also work in that case or is conditioning on the type of `first_valid_value` inevitable? – stav May 19 '20 at 14:50
-
-
@jezrael For the last valid value, would you just reverse the series and use the same fn? – jtlz2 Feb 02 '23 at 10:42
-
1
15
For a series this will return the first no null value:
Creating Series s:
s = pd.Series(index=[2,4,5,6], data=[None, None, 2, None])
which creates this Series:
2 NaN
4 NaN
5 2.0
6 NaN
dtype: float64
You can get the first non-NaN value by using:
s.loc[~s.isnull()].iloc[0]
which returns
2.0
If you on the other hand have a dataframe like this one:
df = pd.DataFrame(index=[2,4,5,6], data=np.asarray([[None, None, 2, None], [1, None, 3, 4]]).transpose(),
columns=['a', 'b'])
which looks like this:
a b
2 None 1
4 None None
5 2 3
6 None 4
you can select per column the first non null value using this (for column a):
df.a.loc[~df.a.isnull()].iloc[0]
or if you want the first row containing no Null values anywhere you can use:
df.loc[~df.isnull().sum(1).astype(bool)].iloc[0]
Which returns:
a 2
b 3
Name: 5, dtype: object

PdevG
- 3,427
- 15
- 30
5
You can also use get
method instead
(Pdb) type(audio_col)
<class 'pandas.core.series.Series'>
(Pdb) audio_col.first_valid_index()
19
(Pdb) audio_col.get(first_audio_idx)
'first-not-nan-value.ogg'

Danil
- 4,781
- 1
- 35
- 50