0

I have already sort the two dataframes

city_future:
City    Future_50
7   Atlanta 1
9   Bal Harbour 1
1   Chicago 8
6   Coalinga    1
independents_future:
City    independents_100
14  Amarillo    1
10  Atlanta 2
18  Atlantic City   1
20  Austin  1

This is what I got so far:

city_future = future.loc[:,"City"].value_counts().rename_axis('City').reset_index(name='Future_50').sort_values('City')
city_independents = independents.loc[:,"City"].value_counts().rename_axis('City').reset_index(name='independents_100').sort_values('City')
hot_cities = pd.merge(city_independents,city_future)
hot_cities

I need to show all the cities in both dataframe, which are in different lentgh, and mark those cities not in the other dataframe by 0. I have no idea why my current output only shows 20 rows... which is in the form of:

City    independents_100    Future_50
0   Atlanta 2   1
1   Bal Harbour 1   1
2   Chicago 15  8

Thank you for helping!

  • 2
    Have you had a look at [this post](https://stackoverflow.com/questions/53645882/pandas-merging-101)? Probably In your case you want to try merge with `how=outer'` – Yolao_21 Apr 14 '22 at 21:18
  • RIght. You're doing an inner join, which will only keep rows that are in BOTH dataframes. To get the rows in EITHER dataframe, you need an outer join. – Tim Roberts Apr 14 '22 at 21:30
  • O I see, I am very new toward python so I don't really know how to use merge, thank you for the extra material! – Jeffrey Ma Apr 15 '22 at 00:25

1 Answers1

0

I believe you can do this without creating the two helper dataframes using the merge method.

setting indicator=True will create a new column in the resulting dataframe that will tell you if the row appears in the left dataframe only (city_future), the right dataframe only (independents_future), or both

merged_df = city_future.merge(right=independents_future,
    left_on='City',
    right_on='City',
    how='outer',
    indicator=True
)

here is the pandas.DataFrame.merge refrence page

hope this helps :)

Kevin
  • 46
  • 3