0

So I've created three data frames from 3 separate files (csv and xls). I want to combine the three of them into a single data frame that is 20 columns and 15 rows. I've managed to successfully do this using the code at the bottom (this is the final part of the code where I started to merge all of the existing data frames I created). However, an odd thing is happening, where the highest ranking country is duplicated 3 times, and there are two values from the 15 columns that should be there but that are missing, and I'm not exactly sure why.

I've set the index to be the same in each data frame!

So essentially my issue is that there are duplicate values showing up and other values being eliminated after I merge the data frames.

If someone could explain the mechanics to me as to why this issue is occuring I'd really appreciate it :)

***merged = pd.merge(pd.merge(df_ScimEn,df_energy[ListEnergy],left_index=True,right_index=True),df_GDP[ListOfGDP],left_index=True,right_index=True))
merged = merged[ListOfColumns]
merged = merged.sort_values('Rank')
merged = merged[merged['Rank']<16]
final = pd.DataFrame(merged)***

***Example: a shorter version of what is happening
expected: 
 A B C D J K L R
1 x y z j a e c d 
2 b c d l a l c d 
3 j k e k a m c d 
4 d k c k a n h d 
5 d k j l a h c d 

generated after I run the code above: (the 1 is repeated and the 3 is missing)

 A B C D J K L R
1 x y z j a b c d 
1 x y z j a b c d 
1 x y z j a b c d 
4 d k c k a b h d 
5 d k j l a h c d***


***Example Input

df1 = {[1:A,B,C],[2:A,B,C],[3:A,B,C],[4:A,B,C],[5:A,B,C]}
df2 = {[1:J,K,L,M],[2:J,K,L,M],[3:J,K,L,M],[4:J,K,L,M],[5:J,K,L,M]}
df3 = {[1:R,E,T],[2:R,E,T],[3:R,E,T],[4:R,E,T],[5:R,E,T]}


So the indexes are all the same for each data frame and then some have a 
different number of rows and different number of columns but I've edited them 
to form the final data frame. and each capital letter stands for a column 
name with different values for each column***
annatn998
  • 75
  • 8
  • 1
    Can you create some dummy input dataframes along with expected output. – Scott Boston Aug 01 '20 at 16:41
  • Just made some edits and put in dummy inputs :) – annatn998 Aug 01 '20 at 19:27
  • 1
    Your input dataframe code doesn't build a dataframe. Try using df1.to_dict() to create a text of your input dataframes. Also, review this post about how to create a good pandas question. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Scott Boston Aug 01 '20 at 19:33
  • Yeah sorry! I didn’t include that part but I did in my actual code. That’s just to show the basic structure of what’s in my data frames! – annatn998 Aug 01 '20 at 19:58
  • Your code and the example values are not consistent. I could not reproduce your error. – above_c_level Aug 02 '20 at 08:06

0 Answers0