0

I have to process a file using my Linux machine.

When I try to write my output to a csv file then gzip it in the same line of script:

processing > output.csv | gzip -f output.csv

I get an 'unexpected end of file' error. Even when I download the file using the Linux machine I get the same error.

When I do not gzip via terminal (or in a single line) everything works fine.

Why does it fail like this when the commands are all in a single line?

Zhe Sheng Lim
  • 154
  • 2
  • 14

2 Answers2

2

You should remove > output.csv

You can either:

  • Use a pipe: | or:
  • Redirect to a file

For the same stream (stdout)

You can redirect errors from stderr to a file with 2>errors.txt or they will display on screen

JohannesB
  • 2,214
  • 1
  • 11
  • 18
  • I need to use the processing output as well, so I needed a file in the gz archive. – Zhe Sheng Lim May 31 '20 at 15:48
  • I don't understand what you mean but maybe look at `tee` or https://stackoverflow.com/questions/60942/how-can-i-send-the-stdout-of-one-process-to-multiple-processes-using-preferably – JohannesB May 31 '20 at 16:07
2

When you redirect a process' IO with the > operator, its output cannot be used by a pipe afterwards (because there's no "output" anymore to be piped). You have two options:

  1. processing > output.csv &&
    gzip output.csv
    

    Writes the unprocessed output of your program to the file output.csv and then in a second task gzips this file, replacing it with output.gz. Depending on the amount of data, this might not be feasible (storage reqirements are the full uncompressed output PLUS the compressed size)

  2. processing | gzip > output.csv.gz
    

    This will compress the output of your process in-line and write it directly to the output file, without storing the uncompressed output in an intermediate file.

knittl
  • 246,190
  • 53
  • 318
  • 364