So I've created three data frames from 3 separate files (csv and xls). I want to combine the three of them into a single data frame that is 20 columns and 15 rows. I've managed to successfully do this using the code at the bottom (this is the final part of the code where I started to merge all of the existing data frames I created). However, an odd thing is happening, where the highest ranking country is duplicated 3 times, and there are two values from the 15 columns that should be there but that are missing, and I'm not exactly sure why.
I've set the index to be the same in each data frame!
So essentially my issue is that there are duplicate values showing up and other values being eliminated after I merge the data frames.
If someone could explain the mechanics to me as to why this issue is occuring I'd really appreciate it :)
***merged = pd.merge(pd.merge(df_ScimEn,df_energy[ListEnergy],left_index=True,right_index=True),df_GDP[ListOfGDP],left_index=True,right_index=True))
merged = merged[ListOfColumns]
merged = merged.sort_values('Rank')
merged = merged[merged['Rank']<16]
final = pd.DataFrame(merged)***
***Example: a shorter version of what is happening
expected:
A B C D J K L R
1 x y z j a e c d
2 b c d l a l c d
3 j k e k a m c d
4 d k c k a n h d
5 d k j l a h c d
generated after I run the code above: (the 1 is repeated and the 3 is missing)
A B C D J K L R
1 x y z j a b c d
1 x y z j a b c d
1 x y z j a b c d
4 d k c k a b h d
5 d k j l a h c d***
***Example Input
df1 = {[1:A,B,C],[2:A,B,C],[3:A,B,C],[4:A,B,C],[5:A,B,C]}
df2 = {[1:J,K,L,M],[2:J,K,L,M],[3:J,K,L,M],[4:J,K,L,M],[5:J,K,L,M]}
df3 = {[1:R,E,T],[2:R,E,T],[3:R,E,T],[4:R,E,T],[5:R,E,T]}
So the indexes are all the same for each data frame and then some have a
different number of rows and different number of columns but I've edited them
to form the final data frame. and each capital letter stands for a column
name with different values for each column***