1

I need to use a compressor like xz to compress huge tar archives.

I am fully aware of previous questions like Create a tar.xz in one command and Utilizing multi core for tar+gzip/bzip compression/decompression

From them, I have found that this command line mostly works:

tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz

I use the pipe solution because I absolutely must be able to pass options to xz. In particular, xz is very CPU intensive, so I must use -T0 to use all available cores. This is why I am not using other possibilities, like tar's --use-compress-program, or -J options.

Unfortunately, I really want to capture all of tar and xz's log output (i.e. non-archive output) into a log file. In the example above, log outout is always generated by those -v options.

With the command line above, that log output is now printed on my terminal.

So, the problem is that when you use pipes to connect tar and xz as above, you cannot end the command line with something like

>Log_File  2>&1

because of that earlier

> OUTPUT_FILE.tar.xz

Is there a solution?

I tried wrapping in a subshell like this

(tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz) >Log_File  2>&1

but that did not work.

HaroldFinch
  • 762
  • 1
  • 6
  • 17
  • Did you try removing `-v` option? According to documentation it enables verbose mode. Also there's `-q` to suppress warnings, notices and errors. See `Other options` section for details: https://linux.die.net/man/1/xz – Igor Nikolaev Jan 25 '18 at 22:17
  • tee(1) is your friend, see https://en.wikipedia.org/wiki/Tee_(command). Also see https://www.gnu.org/software/coreutils/manual/html_node/tee-invocation.html#tee-invocation –  Jan 25 '18 at 22:30
  • Can please expand on your error description of "that did not work"? Please give an example, what the result was, and what you expected to get. – that other guy Jan 25 '18 at 22:32
  • What "normal output" are you talking about? The normal, usual, expected and intended stdout of `xz` **is** the compressed file itself. There *is* no other content on stdout you could possibly want to capture as human-readable logs. – Charles Duffy Jan 25 '18 at 22:51
  • @Igor: I was aware that -v enables extra output and that -q would suppress that. I want the -v option for both tar and xz because I really want a log record of all the "meta data" of the archiving, such as the list of all the files that got archived. – HaroldFinch Jan 25 '18 at 22:56
  • @HaroldFinch, that `-v` content goes to stderr, not stdout. It is not part of "normal stdout". – Charles Duffy Jan 25 '18 at 22:56
  • 1
    See https://stackoverflow.com/questions/4919093/should-i-log-messages-to-stderr-or-stdout -- only "regular output" goes to stdout. The regular output of `xz` is a compressed file. Thus, **everything which is not the compressed file itself** goes to stderr. – Charles Duffy Jan 25 '18 at 22:59
  • 1
    (...now, some programs *do* check whether stderr is a TTY or not, and change their logging to be easier-to-parse in the non-TTY case. This is a feature, not a bug, since it means that you don't need to deal with parsing control codes for seeking the cursor around or such in your log analysis code -- but anyhow, if a program *does* do this, it's not a failure in your redirection somehow not capturing content, it's the program *not writing that content at all* because it detected that a redirection took place). – Charles Duffy Jan 25 '18 at 23:02
  • 1
    @Charles thanks for your flood of comments! You have enlightened me. So, if I modify my command line to `tar -cvf - paths_to_archive | xz -1 -T0 -v 1> OUTPUT_FILE.tar.xz 2> Log_File` I find that most of the output still appears on my terminal (a list of the files archived by xz and then a final line like `execution time: 21 s`) but the redirect to a Log_FIle did succeed in capturing a single line like `(stdin): 384.3 MiB / 1,359.3 MiB = 0.283, 66 MiB/s, 0:20`. So, not all of xz's diagnostic info was logged. Furthermore, is there any way that I can log tar's -v output? – HaroldFinch Jan 25 '18 at 23:10
  • 1
    @HaroldFinch, note the syntax in my answer where the `2>Log_File` is outside the pipeline that contains the curly braces -- that way you'll capture `tar -v`. – Charles Duffy Jan 25 '18 at 23:12
  • ...the `xz -v` progress bar is a different case, because xz simply *doesn't write it at all* if its stderr isn't a terminal. If you really want that content, then you need to emulate a TTY (which, fortunately, several common tools can do -- `script` and `unbuffer`, to name two). If you edit your question to make that aspect of the problem clear (showing specific content present in the TTY `xz -v` output but not present in the directed-to-a-file case), I'll edit my answer to address it. – Charles Duffy Jan 25 '18 at 23:12
  • (That said, could you also amend the question to include the exact version of `xz` you're running? Tools under that name have been built under the aegis of different programs/packages over time). – Charles Duffy Jan 25 '18 at 23:15
  • @Charles: when you say "note the syntax in my answer" I am not sure which answer (comment?) of yours you are referring to. – HaroldFinch Jan 25 '18 at 23:20
  • Answer, not comment. If you don't see it, try refreshing the page. `{ tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz; } 2>Log_File` -- see the curly braces? Note how the redirection of stderr is outside of them? – Charles Duffy Jan 25 '18 at 23:20

2 Answers2

3

The normal stdout of tar is the tarball, and the normal stdout of xz is the compressed file. None of these things are logs that you should want to capture. All logging other than the output files themselves are written exclusively to stderr for both processes.

Consequently, you need only redirect stderr, and must not redirect stdout unless you want your output file mixed up with your logging.

{ tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz; } 2>Log_File

By the way -- if you're curious about why xz -v prints more content when its output goes to the TTY, the answer is in this line of message.c: The progress_automatic flag (telling xz to set a timer to trigger a SIGALRM -- which it treats as an indication that status should be printed -- every second) is only set when isatty(STDERR_FILENO) is true. Thus, after stderr has been redirected to a file, xz no longer prints this output at all; the problem is not that it isn't correctly redirected, but that it no longer exists.

You can, however, send SIGALRM to xz every second from your own code, if you're really so inclined:

{
  xz -1 -T0 -v > OUTPUT_FILE.tar.xz < <(tar -cvf - paths_to_archive) & xz_pid=$!
  while sleep 1; do
    kill -ALRM "$xz_pid" || break
  done
  wait "$xz_pid"
} 2>Log_File

(Code that avoids rounding up the time needed for xz to execute to the nearest second is possible, but left as an exercise to the reader).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Yup. Notably, they haven't given any detailed description of how they determine that to have failed. (If their real syntax were just a little different, putting the `2>&1` more immediately after the `>OUTPUT_FILE.log.xz`, then they'd be mixing their logs into their .xz file, thereby corrupting it). – Charles Duffy Jan 25 '18 at 23:06
  • @thatotherguy, ...frankly, I wouldn't be surprised if they were boing thrown off by `xz` changing its logs based on whether stderr is a tty -- but if that's the real problem, they should edit the question to make it unambiguous. – Charles Duffy Jan 25 '18 at 23:07
  • @Charles: I see your answer now, sorry I missed it yesterday. I am accepting it as correct, many thanks! I have a few points to add. – HaroldFinch Jan 26 '18 at 15:58
  • @Charles First is that the final semicolon (";") inside your curly braces is really essential. I overlooked that the first time I tried your syntax, and it failed with a strange error. I am sure that you know this, I just mention this for anyone else reading this. – HaroldFinch Jan 26 '18 at 16:00
  • @Charles Second, your emphasis that tar and xz always log to stderr needs clarification. Yes, if you have them write their archive output to stdout, then they switch their logs to stderr. Makes perfect sense to me now--they have to, else they will mix log data with archive data! However, I was fooled because I usually have tar write its archive data to a file. **In that case, tar writes normal logs to stdout not to stderr.** Proof: `tar -cvf OUTPUT_FILE.tar paths_to_archive 1> stdout.log 2> stderr.log` writes logs (e.g. from the -v flag) to `stdout.log` while `stderr.log` is left empty. – HaroldFinch Jan 26 '18 at 16:14
  • @Charles and thatotherguy: I must have made a mistake in my testing, because the final code in my original question (to wrap in a subshell) now works fine for me. I like Charle's syntax better--no need to start up a subshell in this case. I also now understand how logging can change when you pipe. – HaroldFinch Jan 26 '18 at 16:26
  • @HaroldFinch, ...that's an unfortunate implementation decision (to write logs to stdout) made by the implementor of whichever version of `tar` you're using -- it directly contradicts the POSIX spec calling for "diagnostic output" to be written to stderr exclusively. That said, `tar` isn't a POSIX-specified tool (the POSIX-defined tool for creating archives is `pax`), so it's technically able to do whatever it wants. – Charles Duffy Jan 26 '18 at 17:01
  • @Charles: you have convinced me that stderr, despite its bad name, should be used for all logs. I, like many others, had thought that it was strictly for error logs. A good link for anyone interested is [this stackexchage question](https://unix.stackexchange.com/questions/331611/do-progress-reports-logging-information-belong-on-stderr-or-stdout) (which Charles has comments on). – HaroldFinch Jan 26 '18 at 18:01
0

First -cvf - can be replaced by cv.

But the normal stdout-output of tar cvf - is the tar file which is piped into xz. Not sure if I completely understand, maybe this:

tar cv paths | xz -1 -T0 > OUTPUT.tar.xz 2> LOG.stderr

or

tar cv paths 2> LOG.stderr | xz -1 -T0 > OUTPUT.tar.xz

or

tar cv paths 2> LOG.tar.stderr | xz -1 -T0 > OUTPUT.tar.xz 2> LOG.xz.stderr

Not sure if -T0 is implemented yet, which version of xz do you use? (Maybe https://github.com/vasi/pixz is worth a closer look) The pv program, installed with sudo apt-get install pv on some systems, is better at showing progress for pipes than xz -v. It will tell you the progress as a percentage with an ETA:

size=$(du -bc path1 path2 | tail -1 | awk '{print$1}')
tar c paths 2> LOG.stderr | pv -s$size | xz -1 -T0 > OUTPUT.tar.xz
Kjetil S.
  • 3,468
  • 20
  • 22