Comparing from Multiple Data Sheets within pandas (python)

Question

I would like to make a new data frame from different excel files.

I have a unique code in one data sheet that shows location and another data sheet containing a person's name with a unique code. What I would like to do is show the location of a person by comparing their unique code.

Similar to the concept below:

Data Frame 1.xlsx

 code = 1234, Location = serpong
 code = 9876, Location = cisauk

Data Frame 2.xlsx :

code = 1234, Name = Dino
code = 9876, Name = ventura

expected data frame.xlsx (result by python) :

Name = dino, Location = serpong
Name = ventura, Location = cisauk

Is it possible to do this with pandas? Or do I need another library to do this?

Also look at [merge](https://stackoverflow.com/questions/53645882/pandas-merging-101) — anky, May 12 '19 at 05:31

MichaelD · Accepted Answer · 2019-05-13T15:27:57.427

Yes, here's an example. Assuming that the XLSX workbook you are reading from has two sheets each with the data (obviously if you are reading from two different Excel workbooks, then you'll need to modify the second read_excel).

>>> import pandas as pd
>>> df1 = pd.read_excel('c:/temp/book1.xlsx',sheet_name='Sheet1', index_col='code')
>>> df1
     Location
code
1234  serpong
9876   cisauk
>>> df2 = pd.read_excel('c:/temp/book1.xlsx',sheet_name='Sheet2', index_col='code')
>>> df2
         Name
code
1234     Dino
9876  ventura
>>> df3 = df1.join(df2)
>>> df3
     Location     Name
code
1234  serpong     Dino
9876   cisauk  ventura

EDIT: Adding additional column in the initial Dataframe creation.

>>> import pandas as pd
>>> df1 = pd.read_excel('c:/temp/book1.xlsx', sheet_name='Sheet1', index_col='code')
>>>df1

     Location  Foo
code
1234  serpong    1
9876   cisauk    2
>>> df2 = pd.read_excel('c:/temp/book1.xlsx', sheet_name='Sheet2', index_col='code')
>>> df2

         Name Foo
code
1234     Dino   a
9876  ventura   b
>>> df3 = df1.join(df2, lsuffix='_df1', rsuffix='_df2')
>>> df3

     Location  Foo_df1     Name Foo_df2
code
1234  serpong        1     Dino       a
9876   cisauk        2  ventura       b

but it doesn't work, ther's "Pandas join issue: columns overlap but no suffix specified" what should i do? — Charisma Bathara, May 13 '19 at 03:51
There are two reasons why this would happen given the data you provided. Either you didn't specify that 'code' is the column which identifies each row (i.e. `index_col="code"` or you have more columns in your dataset. Regardless, I've added a single column in both dataframes that have the same name and show in the update how to address using 'lsuffix' and 'rsuffix'. See [docs](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html?highlight=join#pandas.DataFrame.join) — MichaelD, May 13 '19 at 15:19

Comparing from Multiple Data Sheets within pandas (python)

1 Answers1