12

I am trying to edit compressed fastq.gz text files, by removing the first six characters of lines 2,6,10,14... I have two different ways of doing this right now, either using awk or sed, but these only seem to work if the files are unzipped. I would like to edit the files without unzipping them and tried the following code without getting it to work. Thanks.

Using sed:

zcat /dir/* | sed -i~ '2~4s/^.\{6\}//'

Using awk:

zcat /dir/* | awk 'NR%4==2 {gsub(/^....../,"")} 1'
The Nightman
  • 5,609
  • 13
  • 41
  • 74
  • You can't edit a compressed file in-place. You have to uncompress it, edit it, and then recompress it. Also, regardless of compression, `sed -i` won't work with a pipe - it has no way to write back that way. Has to be a named file. – Mark Reed Feb 17 '15 at 17:47

2 Answers2

28

You can't bypass compression, but you can chain the decompress/edit/recompress together in an automated fashion:

for f in /dir/*; do
  cp "$f" "$f~" &&   
  gzip -cd "$f~" | sed '2~4s/^.\{6\}//' | gzip > "$f"
done

If you're quite confident in the operation, you can remove the backup files by adding rm "$f~" to the end of the loop body.

Mark Reed
  • 91,912
  • 16
  • 138
  • 175
2

I wrote a script called zawk which can do this natively. It's similar to glenn jackman's answer to a duplicate of this question, but it handles awk options and several different compression mechanisms and input methods while retaining FILENAME and FNR.

You'd use it like:

zawk 'awk logic goes here' log*.gz

This does not address sed's "in-place" flag (-i).

Adam Katz
  • 14,455
  • 5
  • 68
  • 83