0

I wrote a tool that extracts data from a large DB and outputs it to an Excel file along with (conditional) formatting to improve readability. For this I use Python with openpyxl on a Linux machine. It works great, but this package is rather slow for writing Excel.

It seems to be a lot quicker to dump the table as (compressed) csv, import that into Excel and apply formatting there using a macro/vba.

To automate the process I'd like to create an empty Excel file pre-loaded with the required VBA to do the formatting; a template. For every data dump, the data is embedded (compressed using deflate) into the Excel file and loaded into the Workbook upon opening the document (or using a "LOAD" button to circumvent macro related security things).

However, just adding some file into the Excel file raises an error when opened:

We found a problem with some content in 'Werkmap1_test_embed.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

Clicking Yes opens the file and shows some tracing information as XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <logFileName>Repair Result to Werkmap1_OLE_Word0.xml</logFileName>
  <summary>Errors were detected in file '/Users/joostk/mnt/cluster/Werkmap1_OLE_Word.xlsx'</summary>
  <additionalInfo>
      <info>Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.</info>
  </additionalInfo>
</recoveryLog>

Is it possible to avoid this? How would I embed a file into the Excel ZIP? Do I need to update some file table (which I could not file easily).

When that's done, I'd like to import the data. Can I access files in the Excel ZIP from VBA? I guess not, and I need to extract the data to some temporary path and load it from there.

I have found these helpful answers elsewhere to load ZIP and plain text:

Many thanks for sharing your thoughts!

hzpc-joostk
  • 361
  • 3
  • 7
  • what is the full error of ` "Excel found unreadable content ..."`? Second, when you open the Excel file is it opened in "recovered" mode? Lastly, is the VBA code the thing that is supplying/writing the data using Named Ranges? – MattR Nov 29 '17 at 17:58
  • Thanks @MattR. I have included the full error message. Note that I am using Office 2016 for Mac, but it should also (or at least) work for Office for Windows. I have not written VBA code, yet. But it probably will just dump the csv as table, not using named ranges. – hzpc-joostk Dec 01 '17 at 08:57

1 Answers1

0

so my "Answer" here is that this is caused by using Named Ranges, or an underlying table, or an embedded Query/Connection. When you start manipulating this file you will get the error that you are talking about:

enter image description here

There is no harm to the file if you click "yes" and open. Excel will open this in Repaired Mode which will require you to re-save the file.

The way I've worked around this is to re-read the "repaired" file, in python, and save it as another file or replace it. Essentially just do an extra step of re-reading the data into memory, and write it to a new file. The error will go away. As always, test this method before deploying to production to ensure no records are lost. The way I solve it is with two lines of pandas.

import pandas as pd
repair = pd.read_excel('PATH_TO_REPAIR_FILE')
new_file = repair.to_excel('PATH_TO_WHERE_NEW_FILE_GOES')
MattR
  • 4,887
  • 9
  • 40
  • 67
  • Thanks again @MattR. Unfortunately, this didn't work, as I want to put the data into a CSV which is added to the Excel archive using an external ZIP program. The embedded CSV is lost when reading and writing just the Excel data using pandas (great package btw), as openpyxl does not take these additional files into account. I will update my question to enlighten this. – hzpc-joostk Dec 01 '17 at 16:06