0

I am looking through a DataFrame with different kinds of data whose usefulness I'm trying to evaluate. So I am looking at each column and check the kind of data it is. E.g.

print(extract_df['Auslagenersatz'])

For some I get responses like this:

0     NaN
1     NaN
2     NaN
3     NaN
4     NaN
       ..
263   NaN
264   NaN
265   NaN
266   NaN
267   NaN

I would like to check whether that column contains any information at all so what I am looking for is something like

s = extract_df['Auslagenersatz']
print(s.loc[s == True])

where I am assuming that NaN is interpreted as False in the same way an empty set is. I would like it to return only those elements of the series that satisfy this condition (being not empty). The code above does not work however, as I get an empty set even for columns that I know have non-NaN entries.

I oriented myself with this post How to select rows from a DataFrame based on column values

but I can't figure where I'm going wrong or how to do this instead. The Problem comes up often so any help is well appreciated.

lpnorm
  • 459
  • 3
  • 10
  • 1
    Surprise! `bool(np.NaN)` is `True`. Do `s[s.notnull()]`, no tricks so the code is self documenting – ALollz Mar 16 '21 at 17:46
  • 2
    you can also make use of `dropna()` method i.e `extract_df['Auslagenersatz'].dropna()` – Anurag Dabas Mar 16 '21 at 17:48
  • @ALollz this is outrageous lol. Goes very much against my understanding of how this boolean interpretation concept works. Thanks to the both of you. – lpnorm Mar 16 '21 at 17:53
  • @lpnorm yeah I've had that issue a few times. Used to be a huge issue with missing Data, i.e. `pd.Series([True, False, np.NaN]).astype('bool')`. Luckily `pandas` has improved and now there's a dedicaed Boolean Type so `pd.Series([True, False, np.NaN]).astype('boolean')` behaves more _expectedly_ for missing data. – ALollz Mar 16 '21 at 17:58

1 Answers1

0
import pandas as pd
df = pd.DataFrame({'A':[2,3,None, 4,None], 'B':[2,13,None, None,None], 'C':[None,3,None, 4,None]})

If you want to see non-NA values of column A then:

df[~df['A'].isna()]
BetterCallMe
  • 636
  • 7
  • 15