Issues sorting dataframe using isin

Question

I have a dataframe that was converted from a csv using pd.read_csv filled with information with California counties; it looks a little something like this:

County	...
Fresno	...
San Diego	...
Madera	...
San Bernardino	...
Stanislaus	...
...	...

I have a list of counties I'm specifically interested in:

relevant_counties = ['Fresno','Madera','San Diego','Stanislaus']

I'm trying to use

filtered_df = df.loc[df['County'].isin.(relevant_counties)]

but it doesn't seem to quite recognize all of the counties. It'll return San Diego's row, but it won't recognize Fresno, Madera, or Stanislaus in the dataframe. I think it has something to do with how the strings in my relevant_counties list are stored as opposed to the county name in the dataframe, but I don't know how to fix this.

Thank you!

EDIT: df['County'].unique()returned:

array(['Fresno ', 'San Bernardino', 'San Joaquin', 'Los Angeles',
       'Stanislaus ', 'Kern ', 'Riverside ', 'San Diego', 'Sacramento ',
       'Merced ', 'Kings ', 'Alameda ', 'Ventura ', 'Imperial ',
       'Orange', 'Tulare', 'Madera', 'Contra Costa', 'Yolo',
       'Santa Clara', 'San Francisco', 'Solano', 'San Mateo', 'Yuba',
       'Butte ', 'Santa Cruz', 'Monterey ', 'Sutter ', 'Sonoma ',
       'Santa Barbara',' Napa',' San Benito',' Tehama','Nevada',
       'Marin ', 'Glenn ', 'Mendocino ', 'Placer ', 'Siskiyou ',
       'Colusa', 'Shasta', 'Tuolumne', 'Inyo', 'Amador', 'Humboldt',
       'St. Louis', 'Lake', 'Scalveras', 'Modoc', 'Lassen',
       'Feathers', 'Sierra', 'Butterfly', 'El Dorado', 'Northern',
       'Mono ', 'Trinity ', 'Alpine '], dtype=object)

I see that some of the county names have some excess white space, so that probably gets in the way. How would I go about removing it all efficiently? However, in the case of 'Madera', it doesn't seem to have the white space but I am still having a hard time searching for it.

Can you provide the output of `df['County'].unique()`? (as [edit](https://stackoverflow.com/posts/72236776/edit) of your question) — mozway, May 14 '22 at 01:56

score 0 · Answer 1 · answered May 14 '22 at 03:20

0

You could use the answer here and use contains() instead of isin(): df[df['County'].str.contains('|'.join(relevant_counties))]

answered May 14 '22 at 03:20

PaNh

143
1
5

Issues sorting dataframe using isin

1 Answers1