1

I'm still learning data manipulation with Pandas and this is the problem which occurred:

Trying to merge two dataframes:

Creating the first one with a string filter

df_mask1 = df[eval(mask1)].groupby(['Country', 'ID']).agg({'Serial': 'count'}).reset_index()

the result looks like this

enter image description here

and doing the same with the second filter creating a second df

df_mask2 = df[eval(mask2)].groupby(['Country', 'ID']).agg({'Serial': 'count'}).reset_index()

enter image description here

and merging the two and filling missing values

merged_df = pd.merge(df_mask1, df_mask2, on=['Country', 'ID'], how='outer', suffixes=('_count_1', '_count_2'))
merged_df.fillna(0, inplace=True)

and the result looks like this

enter image description here

What I'm trying to achieve is having unique values for the Country and ID columns pair. In plain words I need 1 row for ID 88888 which have value 93.0 for Serial_count_1 and 1.0 for Serial_count_2.

Any comments and suggestions are welcomed!

whoisagp
  • 23
  • 7
  • The example you have given isn't really that helpful towards us providing you an answer. You say 1 row for ID 88888, 93.0 1.0 but ID 88888 is shared between Kenya and Pakistan. So if you want to give each country a unique ID you would assume Kenya 11111 and Pakistan 88888 but then you say 93.0 and 1.0 values but they belong to different countries. Please provide an expected output example. – SimonT Aug 01 '23 at 19:38
  • Do you want `merged_df.groupby('ID', as_index=False).sum()`? – mozway Aug 01 '23 at 19:42
  • What is the value of the Country? as you have Kenya and Pakistan with same ID : 88888! – Hamza NABIL Aug 01 '23 at 20:03
  • Thank you for pointing this out! I realized the problem is not with manipulation but in the raw input data - ID should not be shared between countries. – whoisagp Aug 01 '23 at 20:16

0 Answers0