1

So this works as expected:

df1 = pd.DataFrame({'date':[123,456],'price1':[23,34]}).set_index('date')
df2 = pd.DataFrame({'date':[456,789],'price2':[22,32]}).set_index('date')
df1.join(df2, how='outer')

      price1  price2
date                
123     23.0     NaN
456     34.0    22.0
789      NaN    32.0

But if I don't set the index, it causes an error:

df1 = pd.DataFrame({'date':[123,456],'price1':[23,34]})
df2 = pd.DataFrame({'date':[456,789],'price2':[22,32]})
df1.join(df2, on='date', how='outer')

ValueError: columns overlap but no suffix specified: Index(['date'], dtype='object')

Why is this, and am I incorrect for supposing they should give the same result?

Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
  • Does this answer your question? [Pandas join issue: columns overlap but no suffix specified](https://stackoverflow.com/questions/26645515/pandas-join-issue-columns-overlap-but-no-suffix-specified) – Trenton McKinney Jan 23 '21 at 21:27
  • What I don't understand is what's the point of "joining" on a column if you then output separate columns in the result? It doesn't seem to work as I'd expect (or as it would in SQL). Whether the column you're joining on is the index or not doesn't seem relevant to me, but maybe I'm misunderstanding something. – Luigi Plinge Jan 23 '21 at 21:39
  • you are attempting to join the `date` column of `df2` on the index of `df1`. `.join` is for joining on indices, `.merge` provides more options for joining on different combinations of columns or indices. – Trenton McKinney Jan 23 '21 at 21:42
  • Ah that's good to know. Is it possible to join on the `date` column of both, without changing their indexes? – Luigi Plinge Jan 23 '21 at 21:43
  • Yes, use `.merge` instead of `.join`. `pd.merge(df1, df2, on='date')`. – Trenton McKinney Jan 23 '21 at 21:44
  • Also see [SO: Pandas Merging 101](https://stackoverflow.com/q/53645882/7758804) – Trenton McKinney Jan 23 '21 at 21:47

1 Answers1

2

If you want just to add the two dataframes and not joining by a certain column, you need to add suffixes so not to create columns with the same name. e.g.:

df1.join(df2, how='outer', lsuffix='_left', rsuffix='_right')

if you want to join on the column you should use merge:

df1.merge(df2, how='outer')
gtomer
  • 5,643
  • 1
  • 10
  • 21