I have a dataframe that was converted from a csv using pd.read_csv
filled with information with California counties; it looks a little something like this:
County | ... |
---|---|
Fresno | ... |
San Diego | ... |
Madera | ... |
San Bernardino | ... |
Stanislaus | ... |
... | ... |
I have a list of counties I'm specifically interested in:
relevant_counties = ['Fresno','Madera','San Diego','Stanislaus']
I'm trying to use
filtered_df = df.loc[df['County'].isin.(relevant_counties)]
but it doesn't seem to quite recognize all of the counties. It'll return San Diego's row, but it won't recognize Fresno, Madera, or Stanislaus in the dataframe. I think it has something to do with how the strings in my relevant_counties
list are stored as opposed to the county name in the dataframe, but I don't know how to fix this.
Thank you!
EDIT:
df['County'].unique()
returned:
array(['Fresno ', 'San Bernardino', 'San Joaquin', 'Los Angeles',
'Stanislaus ', 'Kern ', 'Riverside ', 'San Diego', 'Sacramento ',
'Merced ', 'Kings ', 'Alameda ', 'Ventura ', 'Imperial ',
'Orange', 'Tulare', 'Madera', 'Contra Costa', 'Yolo',
'Santa Clara', 'San Francisco', 'Solano', 'San Mateo', 'Yuba',
'Butte ', 'Santa Cruz', 'Monterey ', 'Sutter ', 'Sonoma ',
'Santa Barbara',' Napa',' San Benito',' Tehama','Nevada',
'Marin ', 'Glenn ', 'Mendocino ', 'Placer ', 'Siskiyou ',
'Colusa', 'Shasta', 'Tuolumne', 'Inyo', 'Amador', 'Humboldt',
'St. Louis', 'Lake', 'Scalveras', 'Modoc', 'Lassen',
'Feathers', 'Sierra', 'Butterfly', 'El Dorado', 'Northern',
'Mono ', 'Trinity ', 'Alpine '], dtype=object)
I see that some of the county names have some excess white space, so that probably gets in the way. How would I go about removing it all efficiently? However, in the case of 'Madera'
, it doesn't seem to have the white space but I am still having a hard time searching for it.