I have two pandas dataframes, one that is a list of states, cities, and a capital flag with a multiIndex of (state, city), and another that is a non-indexed (or default indexed, if that's more appropriate) list of states and their capitals, I need to perform an inner join on the two and then also find out which items in the cities df are NOT in the join.
Cities:
capital
state city
Ohio Akron N
Toledo N
Columbus N
Colorado Boulder N
Denver N
States:
state city
0 West Virginia Charleston
1 Ohio Columbus
Inner join to find the capital of Ohio:
pd.merge(cities, states, on=['state', 'city'], how='inner')
state city capital
0 Ohio Columbus N
Now I need to get a df that includes everything in the cities df EXCEPT Columbus, Ohio. I've been looking at variations of .isin()
, both with and without reset_index()
, but I can't get it work.
Code to create the cities and states dfs. I have set_index()
as a separate call because if I try to do it when I create the df I get an error about ValueError: Shape of passed values is (3, 3), indices imply (2, 3)
and haven't figured out a way around it.
cities = pd.DataFrame({'state':['Ohio', 'Ohio', 'Ohio', 'Colorado', 'Colorado'], 'city':['Akron', 'Toledo', 'Columbus', 'Boulder', 'Denver'], 'capital':['N', 'N', 'N', 'N', 'N']}, columns=['state', 'city', 'capital'])
cities.set_index(('state', 'city'))
states = pd.DataFrame({'state':['West Virginia', 'Ohio'], 'city':['Charleston', 'Columbus']})