Merge data from several excel sheets into one

Question

I have city addresses I need to put together and find all duplicates. I got to a point where I can find all the duplicates in excel files, easy so far. But I have to change each city in the code to search each file. How do I search each file without having to change the city in the code and then save it of course. I want to merge them which I can but I can't figure out why they create their own 3 columns and don't just merge with the 'A','B' & 'C' columns already there. Maybe Pandas isn't the best library to do this with and a better one can be suggested.

import os

file_df = pd.read_excel("Kermit.xlsx")

file_df.duplicated(subset = 'Address', keep = False)

file_df.drop_duplicates(subset= 'Address',inplace= True)

City = file_df.to_excel("Kermit2.xlsx", index= False)

# path = os.getcwd()
# files = os.listdir(path)
# print(files)

# files_xlsx = [f for f in files if f[-4:] == 'xlsx']

# print(files_xlsx)

# df = pd.DataFrame()

# for f in files_xlsx:
#    data = pd.read_excel(f, 'Sheet1')
#    df = df.append(data)`import os

Do you realise that the variable `City` can only have a value of `None` since `to_excel()` doesn't return anything? Why should you care about overwriting it? Just run it in a for-loop. — NotAName, Oct 25 '20 at 23:18
Can you explain in more detail what you mean by: "they create their own 3 columns and don't just merge with the 'A','B' & 'C' columns already there." — NotAName, Oct 25 '20 at 23:21
The files have three columns, Name, Address and Phone. When I merge two or more to a new file the 'A', 'B' & 'C' columns from the second city will go and create a 'D', 'E' & 'F' columns, so on and so on for all 12 or so cities I have. Instead what I want is for everything to go under the same three columns since its all the same data type of course. — aarongc43, Oct 25 '20 at 23:31
_Instead what I want is for everything to go under the same three columns since its all the same data type of course._ Take a look at https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html. — AMC, Oct 25 '20 at 23:44

score 0 · Accepted Answer · answered Oct 26 '20 at 00:48

It's hard to answer your question properly without knowing what your data is like and what your file naming is. I'll assume that all your excel files are in the same folder and they have same 3 columns of data.

In that case all you need to do is:

import os
import pandas as pd

source_folder = 'your_folder_location'
files = [os.path.join(source_folder, x) for x in os.listdir(source_folder) if x.endswith('xlsx')]

data = pd.append(pd.read_excel(x) for x in files)
data.drop_duplicates(subset=['Address'], inplace=True)
data.to_excel('blablabla.xlsx', index=False)

Merge data from several excel sheets into one

1 Answers1