0

There are 3 dataframes and im having an issue while merging,i have narrowed down the issue.

The first 2 dataframe has a column called 'Country' by default and in the 3rd one(GDP) its called 'Country Name'

Now when i merge without changing the column name for the 3rd data frame ,new_Df2 has 322 rows

columns = ['Country Name','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']

GDP = GDP[columns]
  


newDf = (pd.merge(energy, GDP, how='inner', left_on='Country', right_on='Country Name')
              .merge(ScimEn, how='inner', left_on='Country', right_on='Country'))


         
new_Df2 = pd.merge(energy, GDP, how='outer', left_on='Country', right_on='Country Name')
new_Df2=pd.merge(new_Df2,ScimEn, how='outer', left_on='Country', right_on='Country')

Now im changing the the Column name beforehand and for some reason there are only 318 rows for new_Df2.

columns = ['Country Name','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']

GDP = GDP[columns]
GDP.columns = ['Country','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']


ScimEn = pd.read_excel('scimagojr-3.xlsx')
newDf = (pd.merge(energy, GDP, how='inner', left_on='Country', right_on='Country')
              .merge(ScimEn, how='inner', left_on='Country', right_on='Country'))


         
new_Df2 = pd.merge(energy, GDP, how='outer', left_on='Country', right_on='Country')
new_Df2=pd.merge(new_Df2,ScimEn, how='outer', left_on='Country', right_on='Country')

i just tried new_Df2['Country'].nunique() for both methods and the first one returned only 244 values whereas the second method gave 318 values.

The no of rows for newDf remained same for both methods. Why does it behave in such a manner

ali_cr8
  • 25
  • 7
  • Please add a minimal example, with actual data values filled in: https://stackoverflow.com/help/minimal-reproducible-example In addition to helping other folks replicate your results, you might find the problem yourself. (If so, post it as an answer!) Things I would like to check: Are there values in the "Country Name" column which don't appear in "Country"? Does either column have nulls? To further improve the post, consider adding https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html or other methods that you think are involved to your post. – Sarah Messer Jul 27 '20 at 20:09
  • i am new to this website and getting used to it. I found the solution, if we are merging with different column names ,there will be 2 separate columns(country and country Name) instead of of a single column with all the merged countries. – ali_cr8 Jul 27 '20 at 21:03
  • Did you read https://stackoverflow.com/q/53645882/6692898 ? – RichieV Jul 27 '20 at 23:47

0 Answers0