1

I should join 2 pandas DataFrames with partially overlapping column names: Col1,Col2. The other columns do not overlap.

I get the following error:

ValueError: Indexes have overlapping values: Index(['Col1','Col2']

Joining is done as follows:

df1.join([df2], how='inner')

Of course, I can manually drop Col1 and Col2 from one of DataFrames. But I wonder if there is a better solution. I am using pandas version 0.25.

I am searching for something like this (or other option that would allow avoiding the manual dropping of columns):

df1.join([df2], how='inner', take_overlapping_columns_from_left=True)

Is it possible to do or should I proceed with the columns dropping solution?

Fluxy
  • 2,838
  • 6
  • 34
  • 63

2 Answers2

1

To avoid the duplicate values in the index, we can tell the concat() function to ignore the index and instead use the default integer index.

something like:

pd.concat([df1, df2], ignore_index = True)
hemanta
  • 1,405
  • 2
  • 13
  • 23
  • 1
    Is it possible to do the same using `join`? – Fluxy Nov 28 '19 at 20:31
  • 1
    And what if I have 4 `df's`. Can I do `pd.concat([df1, df2, df3, df4], ignore_index = True)` ? – Fluxy Nov 28 '19 at 20:33
  • I am not sure about multiple data frames. This link can help you to get more insights https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html. Here I found very helpful one:https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns – hemanta Nov 28 '19 at 20:38
0

As mentioned your comment, if you want to join multiple dfs with overlapping column names then it is best to rename those columns or use suffix.

WisdomSeeker
  • 41
  • 1
  • 3