I'm working with a small dataframe (9K rows) that has county
and city
fields (=columns). Some rows have NaN values for the county
but have county names in the city
column.
I can find these by doing
filter1 = df['county'].isna()
filter2 = df['city'].str.contains('County')
df[ filter1 & filter2 ]
For rows that match this criteria, I want assign the city
value to the county
field. This stackoverflow discusses how to do this, but I can't get any of the examples to work.
I've tried a variety of things:
1. df[filter1 & filter2]['county'] = df[filter1 & filter2]['city']
2. df['county'] = df.apply(lambda x: x['city'] if ( x['county'].isna() & 'County' in x['city'] ) else x['county'] , axis=1 )
3. df['county'] = np.where(df['county']=='unknown' & df['city'].str.contains('County'), df['city'], df['county'] )
I get a variety of errors:
- A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Helpful feedback, but I couldn't figure out the syntax for using .loc for this. I got as far as
df.loc[df['county'].isna() & df['city'].str.contains('County),'county'] = ???
I don't know if the .loc function would work, but even if it does, how do I specify the correct / row city
field?
- Type Error: TypeError: unsupported operand type(s) for &: 'str' and 'str'
I thought this might be because county
is a string field, so the lambda f'n objected to doing a .isna() test on it. I changed the NaNs to "unknown" but I still get the error. (I've seen a lot of comments about the slowness of .apply() but on a 9K table, it should be fine.)
- Raises no errors, but doesn't actually change the
county
field!
This seems like a pretty basic operation, but I can't figure out the correct syntax to get it to make the change. What am I doing wrong?