Pandas Merge returns NaN

Question

I have issues with the merging of two large Dataframes since the merge returns NaN values though there are fitting values. The two dfs are shaped like:

df1

df2

Motor   Output Torque (mNm)
0615    0,17
1219    0,72
1516    0,59
1624    2
2230    4,7
2233    5,9
0816    0,7
1016    0,92
1024    1,6
1224    1,7
1319    1,4
1331    3,8
1516    0,97
1524    2,9
1717    2,2
1724    4,5
2224    6,8
2232    10
1336    3,6
1727    4,9
1741    8,8
2237    12
2642    26

I use the code:

MergeDat=MergeDat.merge(Motor,how="left")
print(MergeDat)

where MergeDat= df1 and Motor= df2

As result it returns:

  Motor  Output Torque (mNm)
0      2232                  NaN
1      1524                  NaN
2      2230                  NaN
3      2230                  NaN
4      2224                  NaN
5      1516                  NaN
6      1724                  NaN
7      2224                  NaN
8      1524                  NaN
9      1624                  NaN
10     1724                  NaN
11     2224                  NaN
12     2224                  NaN
13     1524                  NaN
14     1524                  NaN
15     1516                  NaN
16     1524                  NaN
17     2224                  NaN
18     1624                  NaN
19     1724                  NaN
20     1724                  NaN
21     2224                  NaN
22     2224                  NaN
23     1524                  NaN
24     1724                  NaN
25     1841                  NaN
26     2224                  NaN

I have no idea why the Output Torque column is not merged...

Appreciate any help!

Could you add `df.dtypes` to the question? Suspecting if later is string column and former is numeric type. — Zero, Oct 15 '17 at 11:17
I had differnt dtypes for the "Motor" column. Fixed this issue by applying the answer of [jezrael](https://stackoverflow.com/users/2901002/jezrael) — 2Obe, Oct 15 '17 at 11:24

jezrael · Accepted Answer · 2017-10-15T11:22:50.737

46

You need same dtype of joined columns:

#convert first or second to str or int
MergeDat['Motor'] = MergeDat['Motor'].astype(str)
#Motor['Motor'] = Motor['Motor'].astype(str)

#MergeDat['Motor'] = MergeDat['Motor'].astype(int)
Motor['Motor'] = Motor['Motor'].astype(int)

#convert first or second to str or int
#MergeDat['Motor'] = MergeDat['Motor'].astype(str)
Motor['Motor'] = Motor['Motor'].astype(str)

MergeDat['Motor'] = MergeDat['Motor'].astype(int)
#Motor['Motor'] = Motor['Motor'].astype(int)


MergeDat=MergeDat.merge(Motor,how="left")

edited Oct 15 '17 at 11:22

answered Oct 15 '17 at 11:21

jezrael

822,522
95
1,334
1,252

38

Would be great if pandas would print a warning when dtypes are different. – Soerendip Apr 13 '18 at 12:10
@Sören - Yes, maybe in future version of pandas. Btw, last of improvement of merge is [this](https://pandas.pydata.org/pandas-docs/stable/merging.html#checking-for-duplicate-keys()), but same dtypes still not check :( – jezrael Apr 13 '18 at 12:15
I would like to add that I experienced an issue trying to merge columns that were of `object` type together. I had to case them as `str` in order for their join to work. – gr1zzly be4r Sep 26 '18 at 17:59
7

In pandas 0.25.1, I'm having this exact problem, but `.astype(str)` on both columns (`object` containing DOIs) does not change anything. – Katrin Leinweber Oct 14 '19 at 14:10
@KatrinLeinweber - hmmm, object means strings, so try convert to numeric - [link](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) – jezrael Oct 14 '19 at 14:11
Thanks for the link :-) DOIs are unique strings (e.g. `10.1016/j.ribaf.2019.101069`), though, so neither numeric works (`ValueError`) nor are categorical types appropriate. However, when I cast the colum which `NaN`-ed after the merge to `astype("category")` beforehand, the values were preserved. Very strange... – Katrin Leinweber Oct 14 '19 at 14:59
There is a chance if you merge `left` and there are NO common values, it'll look like it found matching records, but you'll have NaN's on the right dataframe because it basically did a `pd.concat(axis=1)` since it didn't find anything to merge. – kevin_theinfinityfund Mar 23 '21 at 05:29
@KatrinLeinweber, try .astype('string'). See https://stackoverflow.com/questions/60581893/convert-object-data-type-to-string-issue-in-python . It didn' t solve the problem of NaN, but at least the dtype has been changed. – toliveira Nov 28 '22 at 13:55
@KatrinLeinweber I am having the same issue with DOIs. All NaNs on the right df after a left join. Did you find a solution? Thanks – sheth7 Apr 20 '23 at 14:46
@KatrinLeinweber and others - I resolved the issue. So it so turns our that the join function uses indexes to join (even though you specify 'on'). You can resolve it by first setting the doi column as index for both dataframes and then joining type=left without specifying the 'on'. so something like this -- combined_df = left_df.set_index('l_doi').join(right_df.set_index('r_doi'), how='left', r_suffix='_right') . This worked for me. – sheth7 Apr 20 '23 at 21:50

score 3 · Answer 2 · answered May 28 '21 at 09:21

3

In my case, it was because I haven't reset the index after splitting the data frame, using df.reset_index(drop=True). Resetting the index of the first data frame enabled merging a second data frame to it.

answered May 28 '21 at 09:21

Pavindu

2,684
6
44
77

score 1 · Answer 3 · answered May 11 '23 at 07:12

1

For me I thought I had the same index but but there was a trailing space in the index for the first dataframe. Removing/trimming that helped.

answered May 11 '23 at 07:12

Erik Thysell

1,160
1
14
31

mirekphd · Answer 4 · 2021-06-17T10:36:35.063

0

Having some NaN's in the key column(s) is the usual culprit from my experience. Try at least the 2nd of these 3 lines on both df's (where unique_id is the key column used for merging) and see if it helps:

print(df[unique_id].duplicated().sum())
df.drop_duplicates(subset=unique_id, inplace=True)
assert(df[unique_id].duplicated().sum() == 0)

edited Jun 17 '21 at 10:36

answered Jun 16 '21 at 20:00

mirekphd

4,799
3
38
59

Pandas Merge returns NaN

4 Answers4

Linked

Related