I'm working with three dataframes: df_1
, df_2
, and df_3
.
They have different numbers of rows and columns, and different information. Each dataframe is indexed by Country name, so that is what connects them all.
The idea is to find the intersection of the three and determine how many unique elements are lost when we do that intersecting.
I start by calling those df's from the function they were created in:
df_1, df_2, df_3 = load_data()
merged_1 = pd.merge(df_1, df_2, how = 'inner', left_index = True, right_index = True)
merged_2 = pd.merge(merged_1, df_3, how = 'inner', left_index = True, right_index = True)
unique_df_1 = pd.merge(df_1, merged_2, how = 'left', left_index = True, right_index = True,
indicator = True).query('_merge=="left_only"')
unique_df_2 = pd.merge(df_2, merged_2, how = 'left', left_index = True, right_index = True,
indicator = True).query('_merge=="left_only"')
unique_df_3 = pd.merge(df_3, merged_2, how = 'left', left_index = True, right_index = True,
indicator = True).query('_merge=="left_only"')
return (len(unique_df_1)+len(unique_df_2)+len(unique_df_3))
This is my very first post on stack overflow, so I hope I did everything correctly. Apologies if I did not or if my writing was not clear.