Default version of git diff
for default .odt files is not showing what was changed
Binary files i/filename.odt and w/filename.odt differ
Is there a way to show what was really changed and keep file directly editable by Libre Office?
Default version of git diff
for default .odt files is not showing what was changed
Binary files i/filename.odt and w/filename.odt differ
Is there a way to show what was really changed and keep file directly editable by Libre Office?
You could also use the flat xml format proposed by Libreoffice.
The .fodt
file format. See Libreoffice and version control or this answer that provides good links.
From the link:
If a document is saved as .fodt file it keeps the same data the .odt file would contain. Only that this time the data is represented as human-readable text (which makes the work much easier for the version control system) and not compressed. So saving a document as flat xml makes it possible to keep server space requirements and network load low at the relatively low cost of wasting a few kilobytes on the local hard disks.
Note that tiny changes will often still result in massive diffs, so it is not fully solving the problem.
I use the following to manage odt and other MS and Libre Office files in git and "git diff" them before committing.
Install the "Libre Office to text" converter:
$ sudo apt install unoconv catdoc
$ pip install python-pptx
Copy https://gitlab.com/wolframroesler/snippets/-/blob/master/git-pptx-textconv.py to a location of your choice and make it executable.
Add the following to ~/.gitconfig
:
[diff "doc"]
textconv=catdoc
[diff "odt"]
textconv=odt2txt
[diff "odp"]
textconv=odp2txt
[diff "ods"]
textconv=ods2txt
[diff "ppt"]
textconv=catppt
[diff "pptx"]
textconv=/location/of/git-pptx-textconv.py
Add the following to ~/.config/git/attributes (or, alternatively, to the .gitattributes file in the project root):
*.doc diff=doc
*.odp diff=odp
*.ods diff=ods
*.odt diff=odt
*.ppt diff=ppt
*.pptx diff=pptx
More details: https://gitlab.com/wolframroesler/snippets#manage-office-files-in-git
Note: As mentioned, ideally one should avoid versioning binary files, as they make comparing, integrating and resolving conflicts more difficult.
In git, you can configure a diff driver specific to each office file to convert them to a plain-text representation before comparing them.
Here are a few examples of tools that can be used:
First, the file type of each office file can be configured globally in the
$HOME/.config/git/attributes
file:
*.doc binary diff=doc
*.odt binary diff=odt
*.ppt binary diff=ppt
*.xls binary diff=xls
Then, to globally configure the diff driver for each of those file types:
git config --global diff.doc.textconv catdoc
git config --global diff.odt.textconv odt2txt
git config --global diff.ppt.textconv catppt
git config --global diff.xls.textconv xls2csv
Source: https://medium.com/@mbrehin/git-advanced-diff-odt-pdf-doc-xls-ppt-25afbf4f1105
Don't store odt files in git. You can unzip
them and store the contents instead which is XML. You might need to add newlines to the XML files as they are, IIRC, just XML one-liners.
For the basics, to diff the text in any zipped-xml format you can use xmllint
to format the xml's and diff those, say you've done
git show master:summary.odt >${file1=`mktemp`}
git show feature:summary.odt >${file2=`mktemp`}
7z x -o ${extract1=`mktemp -d`} $file1
7z x -o ${extract2=`mktemp -d`} $file2
find $extract1 $extract2 -iname \*.xml -execdir xmllint --format {} -o {}.pretty \;
and you can now diff the .pretty
's to see what changed. Pack that up with the usual scaffolding and you've got yourself a basic diff driver. You can even replace the xml with the prettified xml, edit it, repack it, it all works.