Search long series for non NaN entries

Question

I am looking through a DataFrame with different kinds of data whose usefulness I'm trying to evaluate. So I am looking at each column and check the kind of data it is. E.g.

print(extract_df['Auslagenersatz'])

For some I get responses like this:

0     NaN
1     NaN
2     NaN
3     NaN
4     NaN
       ..
263   NaN
264   NaN
265   NaN
266   NaN
267   NaN

I would like to check whether that column contains any information at all so what I am looking for is something like

s = extract_df['Auslagenersatz']
print(s.loc[s == True])

where I am assuming that NaN is interpreted as False in the same way an empty set is. I would like it to return only those elements of the series that satisfy this condition (being not empty). The code above does not work however, as I get an empty set even for columns that I know have non-NaN entries.

I oriented myself with this post How to select rows from a DataFrame based on column values

but I can't figure where I'm going wrong or how to do this instead. The Problem comes up often so any help is well appreciated.

Surprise! `bool(np.NaN)` is `True`. Do `s[s.notnull()]`, no tricks so the code is self documenting — ALollz, Mar 16 '21 at 17:46
you can also make use of `dropna()` method i.e `extract_df['Auslagenersatz'].dropna()` — Anurag Dabas, Mar 16 '21 at 17:48
@ALollz this is outrageous lol. Goes very much against my understanding of how this boolean interpretation concept works. Thanks to the both of you. — lpnorm, Mar 16 '21 at 17:53
@lpnorm yeah I've had that issue a few times. Used to be a huge issue with missing Data, i.e. `pd.Series([True, False, np.NaN]).astype('bool')`. Luckily `pandas` has improved and now there's a dedicaed Boolean Type so `pd.Series([True, False, np.NaN]).astype('boolean')` behaves more _expectedly_ for missing data. — ALollz, Mar 16 '21 at 17:58

score 0 · Answer 1 · answered Mar 16 '21 at 18:36

0

import pandas as pd
df = pd.DataFrame({'A':[2,3,None, 4,None], 'B':[2,13,None, None,None], 'C':[None,3,None, 4,None]})

If you want to see non-NA values of column A then:

df[~df['A'].isna()]

answered Mar 16 '21 at 18:36

BetterCallMe

636
7
15

Search long series for non NaN entries

1 Answers1