2

I have LibreOffice Writer files under git control. I've previously used .odt format and used odt2txt to get readable diffs, by including

[diff "odt"]
    textconv = odt2txt

in my git config file. I'm trying to use the XML-text .fodt format instead, since the text .fodt format is more amenable to git than the binary .odt format.

The problem is that the git diffs are overwhelmingly of the XML tags, obscuring the actual text changed in the Writer files. It was actually much easier to see the diffs using odt and odt2txt than to use fodt.

Is there any program that will strip out all the XML tags, outputting only bare text (similar to what odt2txt outputs from an odt file), so that I can see in my diffs the actual text that was changed?

I am under Windows, but I use Cygwin to have access to a lot of Linux tools, including odt2txt. However, please note that the Linux-based suggestions such as specifying

textconv = sh -c 'odt2txt --raw-input "$0"'

do not work under Windows, even with Cygwin installed. Git under Windows appears to requires a single command, without operands, as the filter.

(This is somewhat aggravated by the fact that I usually use SourceTree for my routine git usage, including looking at diffs, and SourceTree does not line-wrap its diffs, despite having an enhancement request open for a number of years; but even in native git it's an issue.)

codingatty
  • 2,026
  • 1
  • 23
  • 32
  • Is there a reason you are not happy with `odt2txt`? That is one of the answers at https://askubuntu.com/questions/975937/tool-for-viewing-libreoffice-writer-files-in-terminal-window/976085#976085 – Jim K Mar 23 '22 at 20:23
  • @JimK, odt2txt works with odt format; not with fodt format. – codingatty Mar 23 '22 at 22:49
  • Okay, but did you look at the other answers from that link? LibreOffice can (of course) read `fodt` format. Perhaps it would work with an approach similar to https://stackoverflow.com/questions/55601430/how-to-pass-a-filename-argument-gitconfig-diff-textconv Note: I use SourceTree but have not tried the kind of setup that you describe here. – Jim K Mar 24 '22 at 13:05
  • I understand LibreOffice can read fodt. As I said in my question, that's what I've started using, because it is text-based and better for git. I'm hoping for something to extract text from fodt the way odt2txt can extract text from odt. None of the answers at that link relate to this. – codingatty Mar 24 '22 at 15:05
  • `libreoffice --cat "Untitled 1.[f]odt"` — you say that is not related? It "extract[s] text from fodt the way odt2txt can extract text from odt." (Added my changes in `[]` to quoted portions). – Jim K Mar 24 '22 at 23:20
  • Trying that it doesn't produce any output. My initial assumption was that it only worked with .odt. After playing with it a while it still doesn't work, but for a different reason: if you already have an instance of LibreOffice running, a second one, even a command line w/ --cat does not. (I've checked and, if I close all LibreOffice files and exit, I can use --cat.) But I'll hunt around and see if there's a way of forcing a new instance, thanks. Note to any other Windows users: the windows exec is named `soffice` rather than `libreoffice`, but otherwise the command is the same. – codingatty Mar 25 '22 at 00:54
  • @Alfinal: it's a bit borderline. Basically, if you're going to write your own code and have a picky question about syntax or semantics or whatever, you go here; if you want to find some existing code that does something, you go to one of the other sites. (So your *answer* goes here, but it answers a different question: how do I fix odt2txt.c to !) – torek Jul 10 '22 at 02:00
  • @torek I see, you're right. Maybe @codingatty want to edit/improve the question. Now all of us know that you don't need an equivalent to odt2txt, because odt2txt itself do the work. You just have to change a boolean bit to automatize it in `git diff`. – Alfinal Jul 10 '22 at 13:59
  • It's discussed in one of the answers below, but I have updated the question to indicate the approaches suggested in this comment string, which appear to be directed at Git under Linux, do not work for Git under Windows. With respect to whether the question belongs here, you will find thousands of questions tagged [git] that are along similar lines of how to use that program with respect to issues similar to this. – codingatty Jul 10 '22 at 19:52
  • With the update, I've snipped my original comment, so now the replies to it could go as well perhaps :-) – torek Jul 11 '22 at 00:47

2 Answers2

2

Yes, use odt2txt --raw-input to generate a readable txt from a flat .fodt.

UNIX Systems / Systems with sh bash:

As textconv statement in the git config doesn't accept parameters you should write this workaround in its statement:

[diff "fodt"]
    textconv = sh -c 'odt2txt --raw-input "$0"'

All systems:

Change this line in source code and then compile (you could change name of program to have both in your system, maybe you could call it rawinputodt2txt for instance): File odt2txt.c line 48: where:

static int opt_raw_input = 0;

write:

static int opt_raw_input = 1;

So you're going to use --raw-input by default in this compilation. And the extra lines you should add to config file are just:

[diff "fodt"]
    textconv = rawinputodt2txt

Extra explanation: This feature "--raw-input" is working from a recent version 0.5 commit 7f18c95 (first time that 0.5 version was released it with this number it didn't have it, for instance: version 0.5-1+b2 that is packaged in Debian GNU/Linux doesn't have it).

Edit: As maybe would be better that odt2txt just detect a "raw input" I reported it in odt2txt issue tracker but I didn't write a complete patch for it.

Alfinal
  • 61
  • 8
  • Unfortunately, you can't specify a parameter such as --raw-input on the textconv statement in the git config file. – codingatty Jul 07 '22 at 15:14
  • @codingatty you're right. So I improved the answer to allow use `--raw-input` – Alfinal Jul 07 '22 at 20:04
  • Alas, the "sh" trick seems to be a Linux-only solution. It fails on Windows, even with Cygwin available – codingatty Jul 09 '22 at 00:16
  • I haven't a Windows to test it (neither knowledge)... so I suggest you try some of this: * maybe Cygwin shell has another name like bash or dash or other instead? * is it possible in Windows to make some kind of alias that you can name `my_fodt.bat` (or `.exe` or `.py` or whatever you can build) that just exec `odt2txt --raw-input` when you run it? If yes, you could write it instead like this: `[diff "odt"] textconv = my_fodt` * finally, you could change source code using `--raw-input` by default as I edited * extra: you could change your kernel to same kernel that develop the git developer – Alfinal Jul 09 '22 at 04:51
  • Unfortunately it seems `odt2txt` is not maintained - [the last release was in 2014](https://github.com/dstosberg/odt2txt/tags). – l0b0 Feb 01 '23 at 06:11
0

I ended up hacking together my own fodt2txt driver. It's not perfect, but gets most of what I needed done.

codingatty
  • 2,026
  • 1
  • 23
  • 32