Python - Multiple XLSX/XLSM to CSV

Question

I have a folder with multiple *.xlsm-files for example "80-384sec -_november_2017.xlsm", "80-384sec -_december_2017.xlsm", ..... I can read a specific sheet from this file with python like this:

df_xlsmtocsv = pd.read_excel('80-384sec -_november_2017.xlsm', 'sheet3, index_col=None )

And my first solution is something like this:

for file in os.listdir():
    if file.endswith(".xlsm"):
        df_qctocsv = pd.read_excel(file, 'sheet3', index_col=None )
        print(df_qctocsv)
        with open('all.csv', 'a') as f:
            df_qctocsv.to_csv(f, index=True, header=None)

How can I read multiple xlsm-files and append all new messages to a csv-file and order this for example by first column?

After converting I want to copy all this rows from the csv-file to a new sheet in an existing file "messages.xlsx".

What do you mean by messages? Do you want to append a particular column from each of the files to a dataframe? Or do you want to merge certain sheets from every file and save all this to a new csv file? — jjj, Nov 27 '17 at 09:36

jjj · Accepted Answer · 2017-11-27T14:42:35.783

1

There is a lot of ways in which you can join data frames. One possible way is this:

import pandas as pd

df = pd.DataFrame()
for file in os.listdir():
    if file.endswith(".xlsm"):
        df_tmp = pd.read_excel(file, 'Sheet1', index_col=None)
        df = df.append(df_tmp)

df.to_csv('all.csv')

EDIT: If you want to add your dataframe to an existing xlsx file (adapted from here):

from openpyxl import load_workbook

book = load_workbook('<your-xlsx-file>')
wrt = pd.ExcelWriter('<your-output-file>', engine='openpyxl')
wrt.book = book
wrt.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(wrt, '<name-of-your-sheet>')
wrt.save()

edited Nov 27 '17 at 14:42

answered Nov 27 '17 at 10:17

jjj

1,067
1
15
30

But if I want to add the new csv file to an existing xlsx-file but in a new sheet, it overwrites me the file/existing sheet – TigerClaw Nov 27 '17 at 12:02
@TigerClaw could you update your question with the relevant part of your code? It should work if you pass an existing `ExcelWriter` object. See notes [here](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html). – jjj Nov 27 '17 at 12:20
@TigerClaw also have a look at this answer: https://stackoverflow.com/a/20221655/4190925 – jjj Nov 27 '17 at 12:56
No, if I have a file with sheet1 and sheet2 and use df1.to_excel(writer,'sheet1') with Excel Writer it gaves me a file only with sheet1. Existing sheet2 ist aftere that not available. – TigerClaw Nov 27 '17 at 13:20
@TigerClaw I believe it was explained in the answer I linked to. I've added a modified version of it to my answer. – jjj Nov 27 '17 at 13:42
I solved this with two different xlsx files. But I write this two files to another file as sheets I lose my color-format ... – TigerClaw Nov 28 '17 at 08:04
@TigerClaw No wonder you loose formatting if you convert them to dataframes. If you want to keep your formatting have a look at the `openpyxl` package, but then I don't think you can use `pandas` for merging. Also, if my answer helped you, please accept it. – jjj Nov 28 '17 at 17:15

Python - Multiple XLSX/XLSM to CSV

1 Answers1