2

I simply merge two dataframes in common column:

df1

                  email                           account 
0                 555@i555.com                    555
1                 666@666.com                     666
2                 777@666.com                     Nan
3                 888@666.com                     999


df2 (i think ip is index here)

ip                account   
1.1.1.1           555
2.2.2.2           666
.
. 


df3= pd.merge(df1,df2,on='accountname')

in this case, I have missing data. How can I avoid this?

Jasmin
  • 91
  • 2
  • 8
  • df3= pd.merge(df1.dropna(),df2.dropna(),on='accountname', how="inner") – Yonas Kassa Oct 17 '17 at 22:11
  • you don't have accountname field, are you aware of that? – Yonas Kassa Oct 17 '17 at 22:28
  • Can yo please update with a better sample of data. Both inputs and outputs, use [this post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as an example of how to add dataframs to SO – DJK Oct 17 '17 at 22:32

1 Answers1

5
pd.merge(df1,df2,on='accountname',how='left')

Or

pd.merge(df1,df2,on='accountname',how='inner')

EDIT : Let us see your sample data, you merge str with int. that why all NaN

df1.applymap(type)
Out[96]: 
           email        account
0  <class 'str'>  <class 'str'>
1  <class 'str'>  <class 'str'>
2  <class 'str'>  <class 'str'>
3  <class 'str'>  <class 'str'>
df2.applymap(type)
Out[97]: 
               account
ip                    
1.1.1.1  <class 'int'>
2.2.2.2  <class 'int'>

How to do that:

Option1

Change str to numeric by using pd.to_numeric

df1.account=pd.to_numeric(df1.account,errors ='coerce')
df1.applymap(type)
Out[99]: 
           email          account
0  <class 'str'>  <class 'float'>
1  <class 'str'>  <class 'float'>
2  <class 'str'>  <class 'float'>
3  <class 'str'>  <class 'float'>

df1.merge(df2.reset_index(),on=['account'],how='left')


Out[101]: 
          email account       ip
0  555@i555.com     555  1.1.1.1
1   666@666.com     666  2.2.2.2
2   777@666.com     NaN      NaN
3   888@666.com     999      NaN

Option 2

We just change the df2.account to str(I prefer using the first pd.to-numeric)

df2.account=df2.account.astype(str)
df1.merge(df2.reset_index(),on=['account'],how='left')
Out[105]: 
          email account       ip
0  555@i555.com     555  1.1.1.1
1   666@666.com     666  2.2.2.2
2   777@666.com     Nan      NaN
3   888@666.com     999      NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I still have missing data `df1 : email account` and `df2 : account ip` so simple but I dont know why I have missing data from df2 after merge – Jasmin Oct 17 '17 at 21:55
  • @poço check my update, sorry for the late reply , just finish my own work , PS: if it work for you , you can upvoted and accept :) – BENY Oct 17 '17 at 22:36
  • @poço Ok , will step out of my office ~ :-) nice day – BENY Oct 17 '17 at 22:56