-1
import pandas as pd
 from os import listdir

 from os.path import isfile , join

folder = "c:/sheets"

excel_names = [f for f in listdir(folder) if isfile(join(folder, f))]
 print(excel_names)
 excel_files = []
for item in excel_names:
 item = folder + item excel_files.append(item)
 # read them in
excels = [pd.ExcelFile(name) for name in excel_files]
# turn them into dataframes
 frames = [x.parse(x.sheet_names[0], header=None, index_col=None) 
for x in excels]
# delete the first row for all frames except the first
#  i.e. remove the header row -- assumes it's the first

 frames[1:] = [df[1:] for df in frames[1:]]
# concatenate them..
 combined = pd.concat(frames) combined.to_excel(folder+"/combined.xlsx", header=False, index=False)

This is the code I wrote so far but it seems to work for one sheet only in multiple workbook. What should I do?

Word Rearranger
  • 1,306
  • 1
  • 16
  • 25

2 Answers2

1

I think this might help.

I have 3 Excel files in a directory which have data which looks like this

enter image description here

Some of the spreadsheets have multiple sheets. All have the same shape of data, i.e. the same columns and data types.

Import the packages, i'm using glob, just a personal preference for getting file lists

import pandas as pd
import glob

Iterate through the directory grabbing each file, grabbing all the sheets in the spreadsheet and using this get a dataframe for each sheet. Each dataframe gets written to a list.

ls_df = []

for file in glob.glob('/location/to/file/*'):
    print(file)
    for sheet in pd.ExcelFile(file).sheet_names:
        ls_df.append(pd.read_excel(file,sheet))

Then use the list to concat to make a dataframe of all the spreadsheets and their sheets.

df = pd.concat(ls_df)

You end up with a dataframe which looks like this

enter image description here

Export to excel

df.to_excel('test_4.xlsx', index = 0 )

enter image description here

the_good_pony
  • 490
  • 5
  • 12
  • there is an error C:/Users/mohamed.h.mohamad/.spyder-py3/temp.py:9: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. – Mohamed Hassan Oct 30 '19 at 20:43
  • Hey @MohamedHassan sounds like your spreadsheets may not have exactly the same columns. This answer might help https://stackoverflow.com/questions/50501787/python-pandas-user-warning-sorting-because-non-concatenation-axis-is-not-aligne – the_good_pony Oct 30 '19 at 21:01
0

IIUC,

This will only work if your dataframe has the same shape (column wise) for each Sheet else you'll need to modify and adapt the code to your use-case.

Step 1: list xlsx files.

from pathlib import Path
excels = [f for f in Path('c:/sheets').glob('*.xlsx')]

Step 2: Create a list of ExcelFile objects.

xls = [pd.ExcelFile(f) for f in excels]

Step 3: Create a dictionary of key,value pairs split by worksheets.

d = {}

for file in xls:
    for sheet in file.sheet_names:
        if sheet not in d:
            d[sheet] = [] # creates the key which is unique
        else:
            pass # ensures we don't overwrite 
        df = pd.read_excel(file,sheet_name=sheet)
        d[f'{sheet}'].append(df)

Finally, concat the sheets into a dataframe_dictionary:

df_dict = {}
for k,v in d.items():
    df_dict[f'{k}'] = pd.concat(v)

print(df_dict['Sheet1'])
      DatA   Vals
0     A  16577
1     B  37287
2     C  32761
0     A  38707
1     B  17301
..  ...    ...
1     B   6292
2     C  49310
0     A   4401
1     B  12170
2     C   8978

[150 rows x 2 columns]

Tested on 50 xlsx files with two sheets both of which had the same number of columns rows but differing data.

Can't see it not working on your sample case.

Umar.H
  • 22,559
  • 7
  • 39
  • 74