29

I have issues with the merging of two large Dataframes since the merge returns NaN values though there are fitting values. The two dfs are shaped like:

df1

Motor
2232
1524
2230
2230
2224
1516
1724
2224
1524
1624
1724
2224
2224
1524
1524
1516
1524
2224
1624
1724
1724
2224
2224

df2

Motor   Output Torque (mNm)
0615    0,17
1219    0,72
1516    0,59
1624    2
2230    4,7
2233    5,9
0816    0,7
1016    0,92
1024    1,6
1224    1,7
1319    1,4
1331    3,8
1516    0,97
1524    2,9
1717    2,2
1724    4,5
2224    6,8
2232    10
1336    3,6
1727    4,9
1741    8,8
2237    12
2642    26

I use the code:

MergeDat=MergeDat.merge(Motor,how="left")
print(MergeDat)

where MergeDat= df1 and Motor= df2

As result it returns:

  Motor  Output Torque (mNm)
0      2232                  NaN
1      1524                  NaN
2      2230                  NaN
3      2230                  NaN
4      2224                  NaN
5      1516                  NaN
6      1724                  NaN
7      2224                  NaN
8      1524                  NaN
9      1624                  NaN
10     1724                  NaN
11     2224                  NaN
12     2224                  NaN
13     1524                  NaN
14     1524                  NaN
15     1516                  NaN
16     1524                  NaN
17     2224                  NaN
18     1624                  NaN
19     1724                  NaN
20     1724                  NaN
21     2224                  NaN
22     2224                  NaN
23     1524                  NaN
24     1724                  NaN
25     1841                  NaN
26     2224                  NaN

I have no idea why the Output Torque column is not merged...

Appreciate any help!

2Obe
  • 3,570
  • 6
  • 30
  • 54
  • 1
    Could you add `df.dtypes` to the question? Suspecting if later is string column and former is numeric type. – Zero Oct 15 '17 at 11:17
  • I had differnt dtypes for the "Motor" column. Fixed this issue by applying the answer of [jezrael](https://stackoverflow.com/users/2901002/jezrael) – 2Obe Oct 15 '17 at 11:24

4 Answers4

46

You need same dtype of joined columns:

#convert first or second to str or int
MergeDat['Motor'] = MergeDat['Motor'].astype(str)
#Motor['Motor'] = Motor['Motor'].astype(str)

#MergeDat['Motor'] = MergeDat['Motor'].astype(int)
Motor['Motor'] = Motor['Motor'].astype(int)

#convert first or second to str or int
#MergeDat['Motor'] = MergeDat['Motor'].astype(str)
Motor['Motor'] = Motor['Motor'].astype(str)

MergeDat['Motor'] = MergeDat['Motor'].astype(int)
#Motor['Motor'] = Motor['Motor'].astype(int)


MergeDat=MergeDat.merge(Motor,how="left")
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 38
    Would be great if pandas would print a warning when dtypes are different. – Soerendip Apr 13 '18 at 12:10
  • @Sören - Yes, maybe in future version of pandas. Btw, last of improvement of merge is [this](https://pandas.pydata.org/pandas-docs/stable/merging.html#checking-for-duplicate-keys()), but same dtypes still not check :( – jezrael Apr 13 '18 at 12:15
  • I would like to add that I experienced an issue trying to merge columns that were of `object` type together. I had to case them as `str` in order for their join to work. – gr1zzly be4r Sep 26 '18 at 17:59
  • 7
    In pandas 0.25.1, I'm having this exact problem, but `.astype(str)` on both columns (`object` containing DOIs) does not change anything. – Katrin Leinweber Oct 14 '19 at 14:10
  • @KatrinLeinweber - hmmm, object means strings, so try convert to numeric - [link](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) – jezrael Oct 14 '19 at 14:11
  • Thanks for the link :-) DOIs are unique strings (e.g. `10.1016/j.ribaf.2019.101069`), though, so neither numeric works (`ValueError`) nor are categorical types appropriate. However, when I cast the colum which `NaN`-ed after the merge to `astype("category")` beforehand, the values were preserved. Very strange... – Katrin Leinweber Oct 14 '19 at 14:59
  • There is a chance if you merge `left` and there are NO common values, it'll look like it found matching records, but you'll have NaN's on the right dataframe because it basically did a `pd.concat(axis=1)` since it didn't find anything to merge. – kevin_theinfinityfund Mar 23 '21 at 05:29
  • @KatrinLeinweber, try .astype('string'). See https://stackoverflow.com/questions/60581893/convert-object-data-type-to-string-issue-in-python . It didn' t solve the problem of NaN, but at least the dtype has been changed. – toliveira Nov 28 '22 at 13:55
  • @KatrinLeinweber I am having the same issue with DOIs. All NaNs on the right df after a left join. Did you find a solution? Thanks – sheth7 Apr 20 '23 at 14:46
  • @KatrinLeinweber and others - I resolved the issue. So it so turns our that the join function uses indexes to join (even though you specify 'on'). You can resolve it by first setting the doi column as index for both dataframes and then joining type=left without specifying the 'on'. so something like this -- combined_df = left_df.set_index('l_doi').join(right_df.set_index('r_doi'), how='left', r_suffix='_right') . This worked for me. – sheth7 Apr 20 '23 at 21:50
3

In my case, it was because I haven't reset the index after splitting the data frame, using df.reset_index(drop=True). Resetting the index of the first data frame enabled merging a second data frame to it.

Pavindu
  • 2,684
  • 6
  • 44
  • 77
1

For me I thought I had the same index but but there was a trailing space in the index for the first dataframe. Removing/trimming that helped.

Erik Thysell
  • 1,160
  • 1
  • 14
  • 31
0

Having some NaN's in the key column(s) is the usual culprit from my experience. Try at least the 2nd of these 3 lines on both df's (where unique_id is the key column used for merging) and see if it helps:

print(df[unique_id].duplicated().sum())
df.drop_duplicates(subset=unique_id, inplace=True)
assert(df[unique_id].duplicated().sum() == 0)
mirekphd
  • 4,799
  • 3
  • 38
  • 59