1

The documentation of the merge of dataframes state that with the suffixes parameter you can set the columnname suffixes for the left and right dataframe respectively (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html).

According to my example the suffixes are first sorted alphabetically.

A = pd.DataFrame({'prodID':[0,1,2,3,4],'units':[5,6,7,8,9]})
B = pd.DataFrame({'prodID':[0,1,2,3,4],'units':[10,11,12,13,14]})

merged = pd.merge(B,A,on='prodID',suffixes={'_b','_a'})
print(merged)

Actual result:

   prodID  units_a  units_b
0       0       10        5
1       1       11        6
2       2       12        7
3       3       13        8
4       4       14        9

What I expected to get:

   prodID  units_b  units_a
0       0       10        5
1       1       11        6
2       2       12        7
3       3       13        8
4       4       14        9

I don't care about the order in which the columns are after the merge. But when I use the suffixes i want the left dataframe, in my example B, to get the left suffix, '_b'. And the right dataframe the right suffix.

A lot of bugs can occur due to the selection of the wrong column.

Jens Goemaere
  • 31
  • 1
  • 4
  • in the `pd.merge` arguments, switch the suffix order from `{'_b', '_a'}` to `{'_a', '_b'}` and it should work. – anderw Aug 26 '19 at 15:21
  • My issue was caused because I was using a dictionary as input argument {'_b','_a'}. Upon the creation of this dictionary, the items are ordered alphabetically. The correct use of the suffixes is with a tuple () or a list []. – Jens Goemaere Aug 29 '19 at 14:32

0 Answers0