Print out a specific set of rows of a dataset based on conditions

Question

What I am trying:

import re
new_df = census_df.loc[(census_df['REGION']==1 | census_df['REGION']== 2) & (census_df['CTYNAME'].str.contains('^Washington[a-z]*'))& (census_df['POPESTIMATE2015']>census_df['POPESTIMATE2014'])]
new_df

It returns this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

welcome to SO. Could you please read this https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples, and rephrase your question in a manner that one can reproduce it? — Roy2012, Jun 12 '20 at 09:52
You are not using the re module, so might not need to import it? And , please produce a sample of the census_df dataframe content. — Gustav Rasmussen, Jun 12 '20 at 09:56

Gustav Rasmussen · Accepted Answer · 2020-06-12T10:52:44.010

You need to set brackets around each logical expression in filt_1:

filt_1 = (census_df['REGION'] == 1)  | (census_df['REGION'] == 2)

Note that my data for census_df is semi-fictitious but shows the functionality. Everything from the filt_1 assignment operation and downwards will still work for your entire census_df dataframe. This is the full program:

import pandas as pd

cols = ['REGION', 'CTYNAME', 'POPESTIMATE2014', 'POPESTIMATE2015']
data = [[1, "Washington", 4846411, 4858979],
        [3, "Autauga County", 55290, 55347]]

census_df = pd.DataFrame(data, columns=cols)

filt_1 = (census_df['REGION'] == 1)  | (census_df['REGION'] == 2)
filt_2 = census_df['CTYNAME'].str.contains("^Washington[a-z]*")
filt_3 = census_df['POPESTIMATE2015'] > census_df['POPESTIMATE2014']

filt = filt_1 & filt_2 & filt_3

new_df = census_df.loc[filt]

print(new_df)

Returns:

   REGION     CTYNAME  POPESTIMATE2014  POPESTIMATE2015
0       1  Washington          4846411          4858979

Print out a specific set of rows of a dataset based on conditions

1 Answers1