filtering multiple criteria, including not nan

Question

I am trying to get a subset of data from a dataframe, based on two criteria, where Region is like 'nebraska' and where metric1 is not nan

The region filter works, but the nan filter does not and I am unsure why. Would appreciate some pointers on why it does not work.

Thanks!

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import pyodbc


data = [{ 'Region':'nebraska', 'metric1':50},
        { 'Region':'nebraska', 'metric1':np.nan},
        { 'Region':'nebraska', 'metric1':50},
        { 'Region':'nebraska', 'metric1':50},
        { 'Region':'oaklahoma', 'metric1':np.nan},
        { 'Region':'oaklahoma', 'metric1':50},
        { 'Region':'oaklahoma', 'metric1':np.nan},]

testDataset = pd.DataFrame(data)

testDataset[(testDataset.Region.str.contains('nebraska')) & (testDataset.metric1 != np.nan)]

did the below solution help? feel free to accept if it did, or ask for clarification. — jpp, Mar 10 '18 at 00:07

score 0 · Answer 1 · answered Mar 02 '18 at 18:25

This should work:

testDataset[testDataset['Region'].str.contains('nebraska', regex=False, na=False) & \
            pd.notnull(testDataset['metric1'])]

np.nan == np.nan is False (see explanation) which is why your method does not work.

I've also added regex=False, which should improve performance, and na=False so that you do not get errors for unexpected types.

filtering multiple criteria, including not nan

1 Answers1