Use zcat and sed or awk to edit compressed .gz text file

Question

I am trying to edit compressed fastq.gz text files, by removing the first six characters of lines 2,6,10,14... I have two different ways of doing this right now, either using awk or sed, but these only seem to work if the files are unzipped. I would like to edit the files without unzipping them and tried the following code without getting it to work. Thanks.

Using sed:

zcat /dir/* | sed -i~ '2~4s/^.\{6\}//'

Using awk:

zcat /dir/* | awk 'NR%4==2 {gsub(/^....../,"")} 1'

You can't edit a compressed file in-place. You have to uncompress it, edit it, and then recompress it. Also, regardless of compression, `sed -i` won't work with a pipe - it has no way to write back that way. Has to be a named file. — Mark Reed, Feb 17 '15 at 17:47

Mark Reed · Accepted Answer · 2017-01-13T15:44:18.977

28

You can't bypass compression, but you can chain the decompress/edit/recompress together in an automated fashion:

for f in /dir/*; do
  cp "$f" "$f~" &&   
  gzip -cd "$f~" | sed '2~4s/^.\{6\}//' | gzip > "$f"
done

If you're quite confident in the operation, you can remove the backup files by adding rm "$f~" to the end of the loop body.

edited Jan 13 '17 at 15:44

answered Feb 17 '15 at 17:49

Mark Reed

91,912
16
138
175

Works like charm! – Kenneth Chirchir Aug 23 '23 at 11:32

score 2 · Answer 2 · answered Mar 13 '20 at 18:30

I wrote a script called zawk which can do this natively. It's similar to glenn jackman's answer to a duplicate of this question, but it handles awk options and several different compression mechanisms and input methods while retaining FILENAME and FNR.

You'd use it like:

zawk 'awk logic goes here' log*.gz

This does not address sed's "in-place" flag (-i).

Use zcat and sed or awk to edit compressed .gz text file

2 Answers2

Linked