0

Before I start, I have found similar questions and tried the responding answers however, I am still running into an issue and can't figure out why.

I have 6 data frames. I want one resulting data frame that merges all of these 6 into one, based on their common index column Country. Things to note are: the data frames have different number of rows, some country's do not have corresponding values, resulting in NaN.

Here is what I have tried:

data_frames = [WorldPopulation_df, WorldEconomy_df, WorldEducation_df, WorldAggression_df, WorldCorruption_df, WorldCyberCapabilities_df]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Country'], how = 'outer'), data_frames)

This doesn't work as the final resulting data frame pairs up the wrong values with wrong country. Any suggestions?

intellgirl
  • 13
  • 3
  • 1
    what are you expecting ? users need to see the data (or a sample of it to help you: https://stackoverflow.com/help/minimal-reproducible-example – D.L Jul 10 '22 at 00:47
  • 1
    The merge looks fine. Hard to say what's wrong without seeing the data or output. Can you include [enough of a dataframe to reproduce](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)? – Nick ODell Jul 10 '22 at 00:47
  • With that chain merge, I am curious to see this result: *wrong values with wrong country*. Please post sample input data and current results. – Parfait Jul 10 '22 at 02:43

1 Answers1

-1

let's see, "pd.merge" is used when you would add new columns from a key. In case you have 6 dataframes with the same number of columns, in the same order, you can try this:

columns_order = ['country', 'column_1']
concat_ = pd.concat(
    [data_1[columns_order], data_2[columns_order], data_3[columns_order], 
     data_4[columns_order], data_5[columns_order], data_6[columns_order]],
    ignore_index=True,
    axis=0
)

From here, if you want to have a single value for the "country" column, you can apply a group by to it:

concat_.groupby(by=['country']).agg({'column_1': max}).reset_index()