53

I'm trying to create an xlsx file programmatically on iOS. Since the internal data of xlsx files is basically stored in separate xml files, I tried to recreate xlsx structure with all its files and subdirectories, compress them into a zip file and set its extension to xlsx. I use GDataXML parser/writer for creating all the necessary xml files. However, the file I get can't be opened as xlsx file. Even if I rip all the data from a valid xlsx file, create all the xml files manually by copying data from the original xml files and compress them manually, I can't recreate a valid xlsx file.

The questions are:

  • is xlsx really just an archive containing xml files?
  • how do I create a valid xlsx file programmatically if I can't just compress xml files into zip file and set its extension to xlsx?
nick130586
  • 801
  • 1
  • 9
  • 21

4 Answers4

85

In answer to your questions:

  1. XLSX is just a collection of XML files in a zip container. There is no other magic.
  2. If you decompress/unzip a valid XLSX files and then recompress/zip it and you can't read the resulting output then the problem is generally with the files being rezipped or, less likely, the zipping software. The main thing to check is that the directory structure was maintained in the zip file.

Example of the contents of an xlsx file:

unzip -l example.xlsx
Archive:  example.xlsx
  Length     Date   Time    Name
 --------    ----   ----    ----
      769  10-15-14 09:23   xl/worksheets/sheet1.xml
      550  10-15-14 09:22   xl/workbook.xml
      201  10-15-14 09:22   xl/sharedStrings.xml
      ...

I regularly unzip XLSX files, make minor changes for testing and re-zip them without any issue.

Update: The important thing is to avoid zipping the parent directory. Here is an example using the zip system utility on Linux or the OS X:

# Unzip an xlsx file into a directory.
unzip example.xlsx -d newdir

# Make some valid changes to the files.
cd newdir/
vi xl/worksheets/sheet1.xml

# Rezip the files *FROM* the unzipped directory.
# Note: you could also re-zip to the original file if required.
find . -type f | xargs zip ../newfile.xlsx

# Check the file looks okay.
cd ..
unzip -l newfile.xlsx
xdg-open newfile.xlsx
jmcnamara
  • 38,196
  • 6
  • 90
  • 108
  • 1
    7-zip version 9.20 is not doing the job. What software do you use that has the correct default values for Office's xlsx? – Pedro Reis Oct 15 '14 at 07:24
  • 7
    7-zip version 9.20 did the trick, but as David said, we shall not make a zip with parent folder. The zip (that will be renamed to xslx) must have directly the _rels, docProps and xl folders. – Pedro Reis Oct 15 '14 at 07:31
  • 2
    I've added an example. – jmcnamara Oct 15 '14 at 08:29
  • 25
    Most important statement in the answer, "The important thing is to avoid zipping the parent directory" – Zach Green Aug 10 '16 at 14:19
  • 2
    make sure hidden files are not added (such as .Ds_store in Mac). I'm sure that can be filtered out with find – Maragues Sep 08 '16 at 08:46
  • Why is hard to open password protected xlsx in linux then, if "it's only a zip", opening password protected xlsx would be just easy. But no. It's not like that. All I have found is that I need to work under Win32 which is ONLY supported by Windows OS. So what you gonna say. – Máxima Alekz Jan 08 '18 at 05:10
  • @Máxima Alekz Password protected files aren't in a zip format. AFAIK, they are an OLE Compound Doc format. Probably something similar to the older xls format. – jmcnamara Jan 08 '18 at 16:14
36

If I decompress an xlsx file into a folder and then I recompress it again, the xlsx becomes corrupt / not recognized. In my case, the cause is that my zip tool is using the folder name as the first level for the relative path of each file inside the zip.

I have solved the problem by creating an empty zip file INSIDE the folder with the xlsx contents and then adding all the files and folders to it.

Actually, if you try to zip the folder itself, the file is not a valid xlsx. You should rather go inside the folder, select all the contents and then right-click & zip.

Community
  • 1
  • 1
David
  • 2,942
  • 33
  • 16
2

I was using WinZip 15.5 to rezip xlsx xml files. Different compression types produced different results.

Note: The original file size was 555KB.

  • .Zip: New filesize 3,279KB (!). Excel can open.
  • .Zipx: New filesize 341KB. Excel couldn't open.
  • Zip SuperFast: New filesize 606KB. Excel can open.
  • Zip Enh. Deflate: New filesize 429KB. Excel couldn't open.
  • Zipx bzip2: New filesize 333KB. Excel couldn't open.
  • Zipx LZMA: New filesize 328KB. Excel couldn't open.
  • Zipx PPMd: New filesize 317KB. Excel couldn't open.

Conclusion: Zip SuperFast is the only effective compression format.

Crozz
  • 21
  • 1
2

I was having issues and found that was zipping on the wrong folder level. You need to navigate into folder created when you unziped the xlsx and zip the actual files, not the container folder. Dummy me, shared my story, maybe it can help others save time...

João
  • 2,296
  • 5
  • 20
  • 30