5

We all know about tools, 3rd-party or built in to the OS, for treating compressed files as folders. But does anyone know how to do the reverse: trick the OS into thinking that a standard folder containing some files is actually a zipped file? Solutions for Windows, Linux and Mac are sought (though I realize no single solution will work across all these platforms).

The context of the question is in getting source code version control systems like SVN, Git or Mercurial to be more efficient at storing diffs between versions of documents that are actually compressed folders (holding various XML files, a bit of metadata and a thumbnail or two), such as ODT and DOCX.

I already know about Zipdoc and similar utils that use the Git and Mercurial encode/decode hooks to transform data into and out of the repository. This is a fine solution to the problem but I found myself wanting to browse the repository containing the uncompressed folder contents of the document and individually diff the files therein.

This means the uncompressed contents must be added to the repository, not the tar'd or zipped-without-compression version of the document. This in turn means a checkout from the repository produces an uncompressed folder full of files that represent the document. Hence my original question.

The mythical product I envision would detect a folder whose name contains a "registered" extension ("docx" for example) and then "re-mounts" it as a compressed file of the same name.

Alternately, does anyone know how to exploit the Git/Mercurial encode/decode hooks to achieve this dream?

textral
  • 1,029
  • 8
  • 13
  • 1
    I must be misunderstanding you, because I don't see why you need to trick the OS into thinking a folder is compressed. Why not just include the regular uncompressed directory in the repo (with no special extension), and let the OS think it is uncompressed. Because it is. – Ben Lee Feb 13 '12 at 06:15
  • 1
    This question is vary general. You'll likely get better responses if you can make it more specific. – David Wolever Feb 13 '12 at 06:33
  • @BenLee: So double-clicking on the "reinterpreted" file will open the appropriate registered app for files of that type/extension. I'm picturing a driver-level facility here. – textral Feb 13 '12 at 11:45
  • A driver-level util: that now makes me realize that Mercurial (or Git) would not be able to tell it was really a folder when it came time to commit modifications back to the repos. Oh well, I still wonder if you can make a folder appear to the OS as a compressed file. – textral Feb 13 '12 at 11:55
  • Even if it was "driver-level", I could well imagine some way, say a right-click context menu in the user interface, to unmount the file back to a normal folder prior to committing to the repos. – textral Feb 17 '12 at 05:26
  • You already mentioned "zipdoc". What about invoking it the opposite direction, zip before committing, unzip after updating? It's just a guess, I've never tried. – Christoph Jüngling Mar 28 '12 at 20:55
  • Solution for Git: http://stackoverflow.com/questions/30728630/what-should-i-do-if-i-put-docx-document-into-a-git-repository – Nick Volynkin Jun 09 '15 at 13:06

2 Answers2

2

To solve this in a nice way, you could use a Hurd translator with nsmux - though changing your kernel is likely a big step for that :)

http://www.gnu.org/software/hurd/hurd/translator/nsmux.html

You might be able to adapt the tarfs translator. This would allow you to open the folders via folder,,zip.

But it would require quite some work.

(this is a nice example of a really simple usecase pointing to a rather complex problem)

Arne Babenhauserheide
  • 2,423
  • 29
  • 23
0

This program can help you:

https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe

Source code:

https://bitbucket.org/htilabs/ooxmlunpack

You have to configure a path in which this program will process all files.
After executing the program it will

  • decompress all office files (xlsx, xlsm, docx, ...)
  • compress it again with no compression (=TAR ball)

After all you have the "same" files which now need more diskspace (but still working with Word/Excel). But in this state, changes in these files will only need a minimum of diskspace within a repository (because they are not "binary" anymore).
As a waste product you also have the extracted content which can be deleted if necessary.

See also Version-controlling zipped files (docx, odt)

Michael Hutter
  • 1,064
  • 13
  • 33