3

I'm trying to loop through .xlsb file types in a folder and convert them to .csv in Python 3+ and Windows 10 and have pieced together the code below with help from SO. I want to save the new .csv as the original .xlsb name but am having issues - I have this so far:

import os
import glob
import win32com.client

path = r'C:\Users\folder\Desktop\Test Binary'
all_files_test = glob.glob(os.path.join(path, "*.xlsb"))
for file in all_files_test:
    excel = win32com.client.Dispatch("Excel.Application")
    excel.Visible = False
    doc = excel.Workbooks.Open(file)
    doc.SaveAs(Filename="C:\\Users\\folder\\Desktop\\Test Binary\\file.csv",FileFormat = 6) #overwrites file each time, need to substitute 'file'
    doc.Close(True)
    excel.Quit()
excel.Quit()

Which of course just overwrites each new iteration each time as 'file.csv'. How can I substitute the .xlsb name for each .csv name to SaveAs separate files? Thanks in advance.

Hatt
  • 689
  • 4
  • 9
  • 23

2 Answers2

3

Simply use str.replace on file variable to change extension. And consider wrapping in try/except to cleanly release COM objects regardless of error or not.

path = r'C:\Users\folder\Desktop\Test Binary'

all_files_test = glob.glob(os.path.join(path, "*.xlsb"))

for file in all_files_test:    
    try: 
        excel = win32com.client.Dispatch("Excel.Application")
        excel.Visible = False
        doc = excel.Workbooks.Open(file)

        csv_name = file.replace('.xlsb', '.csv')

        doc.SaveAs(Filename = csv_name, FileFormat = 6)
        doc.Close(True)
        excel.Quit()

    except Exception as e:
        print(e)

    finally:    
        doc = None
        excel = None

And to go one level deeper use combination of os.path.basename and os.path.join:

path = r'C:\Users\folder\Desktop\Test Binary'

...
csv_name = os.path.basename(file).replace('.xlsb', '.csv')

doc.SaveAs(Filename = os.path.join(path, 'Conversion_Files', csv_name), FileFormat = 6)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you very much - this does it perfectly. – Hatt Apr 18 '18 at 17:13
  • That will fail in the case of `C:\My xlsb document.xlsb`; instead trim the string with slice syntax. Also, don't catch `Exception` if there's something more specific you can catch. – wizzwizz4 Apr 18 '18 at 17:13
  • @Hatt It has two small errors that might cause problems later down the line, but they're easily remedied. – wizzwizz4 Apr 18 '18 at 17:14
  • @wizzwizz4 You mean the answer has two small errors? My entire process here is take a bunch of .xlsb files (still need to explicitly take the correct tab), convert to csv, concatenate csvs into one file with a new column containing original file name, and analyze. Will that cause any issues? – Hatt Apr 18 '18 at 17:30
  • @wizzwizz4 In this specific instance it should only show up in the file extension - basically we have many locations that use a legacy .xlsb workbook as a form and save their data with the original file name and their location. I could rewrite the all_files_test variable - I just started the script off that way but sounds like there's a cleaner/safer way to do it. – Hatt Apr 18 '18 at 17:40
  • Ah - gotcha - I'll go ahead and make the switch now, might as well keep as safe as possible. @wizzwizz4 Sorry one other question - it sounds simple but I'm getting errors just now - if i wanted to change the out folder to one level deeper (Conversion_Files) how can I add that in? And thank you very much for the feedback. – Hatt Apr 18 '18 at 17:47
  • @Hatt There were too many changes for me to post in comments, so I've posted an answer. – wizzwizz4 Apr 18 '18 at 18:04
  • @Hatt, you can also use period in the replace to match on extension. I doubt files will contain *xlsb* in the name. Also, to go one level deeper, use `os.path.join` on the *path* string. See edit. – Parfait Apr 18 '18 at 18:11
  • @Parfait Thank you for that edit - I tried that four different ways but should have gone back to os. Thank you for that update. – Hatt Apr 18 '18 at 18:15
2

Parfait's answer is good, but has a few flaws. I have remedied those (that I have noticed) in this answer, and refactored out some context managers to make the logic easier to understand (and hence easier to modify).

It now prints failed files to sys.stdout (to let you recover, Unix-style, by replacing the for loop with repeated input() / f.readline()[:-1] calls), and only opens the Excel COM object once; this should be a lot faster.

I have also added support for recursively performing this match, but this feature requires Python 3.5 or above in order to work.

import os
import glob
import traceback
from contextlib import contextmanager
import win32com.client
from pythoncom import com_error

PATH = r'C:\Users\folder\Desktop\Test Binary'

@contextmanager
def open_excel():
    excel = win32com.client.Dispatch("Excel.Application")
    excel.Visible = False
    try:
        yield excel
    finally:
        excel.Quit()

@contextmanager
def open_workbook(excel, filename):
    doc = excel.Workbooks.Open(filename)
    try:
        yield doc
    finally:
        doc.Close(True)

all_files_test = glob.glob(os.path.join(PATH, "**.xlsb"), recursive=True)

with excel_cm() as excel:
    for file in all_files_test:    
        try: 
            with open_workbook(file) as doc:
                doc.SaveAs(Filename=file[:-4] + 'csv', FileFormat=6)
        except com_error as e:
            print(file)
            traceback.print_exc()
wizzwizz4
  • 6,140
  • 2
  • 26
  • 62
  • This is practically my answer! And `Exception as e` [catches *any* raised error](https://stackoverflow.com/a/4992124/1422451). – Parfait Apr 18 '18 at 18:13
  • @Parfait I know, which is why it's not the best plan to use `Exception`; it doesn't catch `KeyboardInterrupt` et al. as `BaseException` would have, but we don't want to catch an `IndexError`, for example. – wizzwizz4 Apr 18 '18 at 18:40