2

Default version of git diff for default .odt files is not showing what was changed

Binary files i/filename.odt and w/filename.odt differ

Is there a way to show what was really changed and keep file directly editable by Libre Office?

reducing activity
  • 1,985
  • 2
  • 36
  • 64
  • Possible duplicate of [How to diff .odt files with difftool? kdiff3 diff outputs unreadable characters](https://stackoverflow.com/questions/33448260/how-to-diff-odt-files-with-difftool-kdiff3-diff-outputs-unreadable-characters) – phd Oct 18 '18 at 13:15
  • @phd I am trying to do it in the command line, with git diff - not with an external tool. – reducing activity Oct 18 '18 at 13:26
  • I have written a bash script that enhances Libreoffice docs into git friendly format - see https://github.com/timwiel/libreoffice2git – Tim Wiel May 20 '20 at 09:54

5 Answers5

4

You could also use the flat xml format proposed by Libreoffice.

The .fodt file format. See Libreoffice and version control or this answer that provides good links.

From the link:

If a document is saved as .fodt file it keeps the same data the .odt file would contain. Only that this time the data is represented as human-readable text (which makes the work much easier for the version control system) and not compressed. So saving a document as flat xml makes it possible to keep server space requirements and network load low at the relatively low cost of wasting a few kilobytes on the local hard disks.

Note that tiny changes will often still result in massive diffs, so it is not fully solving the problem.

reducing activity
  • 1,985
  • 2
  • 36
  • 64
Philippe
  • 28,207
  • 6
  • 54
  • 78
  • 1
    It is not matching exactly the question, but it is matching exactly what I wanted. So I edited the question a bit. – reducing activity Oct 19 '18 at 08:45
  • 1
    The .fodt format is basically useless for diffing, because LibreOffice keeps shuffling IDs around, drowning the content changes in irrelevant metadata changes. – l0b0 Feb 01 '23 at 05:36
2

I use the following to manage odt and other MS and Libre Office files in git and "git diff" them before committing.

Install the "Libre Office to text" converter:

$ sudo apt install unoconv catdoc
$ pip install python-pptx

Copy https://gitlab.com/wolframroesler/snippets/-/blob/master/git-pptx-textconv.py to a location of your choice and make it executable.

Add the following to ~/.gitconfig:

[diff "doc"]
    textconv=catdoc
[diff "odt"]
    textconv=odt2txt
[diff "odp"]
    textconv=odp2txt
[diff "ods"]
    textconv=ods2txt
[diff "ppt"]
    textconv=catppt
[diff "pptx"]
    textconv=/location/of/git-pptx-textconv.py

Add the following to ~/.config/git/attributes (or, alternatively, to the .gitattributes file in the project root):

*.doc diff=doc
*.odp diff=odp
*.ods diff=ods
*.odt diff=odt
*.ppt diff=ppt
*.pptx diff=pptx

More details: https://gitlab.com/wolframroesler/snippets#manage-office-files-in-git

Wolfram Rösler
  • 302
  • 1
  • 12
1

Note: As mentioned, ideally one should avoid versioning binary files, as they make comparing, integrating and resolving conflicts more difficult.


In git, you can configure a diff driver specific to each office file to convert them to a plain-text representation before comparing them.

Here are a few examples of tools that can be used:

  • catdoc (for Word)
  • catppt (for Powerpoint)
  • odt2txt (for Writer)
  • xls2csv (for Excel)

First, the file type of each office file can be configured globally in the $HOME/.config/git/attributes file:

*.doc binary diff=doc
*.odt binary diff=odt
*.ppt binary diff=ppt
*.xls binary diff=xls

Then, to globally configure the diff driver for each of those file types:

git config --global diff.doc.textconv catdoc
git config --global diff.odt.textconv odt2txt
git config --global diff.ppt.textconv catppt
git config --global diff.xls.textconv xls2csv

Source: https://medium.com/@mbrehin/git-advanced-diff-odt-pdf-doc-xls-ppt-25afbf4f1105

kelvin
  • 1,421
  • 13
  • 28
0

Don't store odt files in git. You can unzip them and store the contents instead which is XML. You might need to add newlines to the XML files as they are, IIRC, just XML one-liners.

choroba
  • 231,213
  • 25
  • 204
  • 289
  • Doing that will make in turn editing them much more inconvenient and I am not happy about storing some converted version that should be equivalent - but will not be in case of bug in the converting pipeline. – reducing activity Oct 18 '18 at 13:25
  • Converting in this case is just zip/unzip, it should be bug free :-) – choroba Oct 18 '18 at 13:39
  • I still worry about setup that will trigger zip/unzip at proper moments, at proper files and will overwrite iff necessary. – reducing activity Oct 18 '18 at 13:46
0

For the basics, to diff the text in any zipped-xml format you can use xmllint to format the xml's and diff those, say you've done

git show master:summary.odt >${file1=`mktemp`}
git show feature:summary.odt >${file2=`mktemp`}
7z x -o ${extract1=`mktemp -d`} $file1
7z x -o ${extract2=`mktemp -d`} $file2
find $extract1 $extract2 -iname \*.xml -execdir xmllint --format {} -o {}.pretty \;

and you can now diff the .pretty's to see what changed. Pack that up with the usual scaffolding and you've got yourself a basic diff driver. You can even replace the xml with the prettified xml, edit it, repack it, it all works.

jthill
  • 55,082
  • 5
  • 77
  • 137