So I have the following sed
one liner:
sed -e '/^S|/d' -e '/^T|/d' -e '/^#D=/d' -e '/^##/d' -e 's/H|/,H|/g' -e 's/Q|/,,Q|/g' -e '1 i\,,,' sample_1.txt > sample_2.txt
I have many lines that start with either:
S|
T|
#D=
##
H|
Q|
The idea is to not copy the lines starting with one of the first fours and
to replace H|
(at the beginning of lines) by ,H|
and Q|
(at the beginning of lines) by ,,Q|
But now I would need to:
- use the fastest way possible (internet suggests (m)awk is faster than sed)
- read from a .txt.gz file and save the result in a .txt.gz file, avoiding, if possible, the intermediate un-zip/re-zip
there are in fact several hundreds .txt.gz files, each about ~1GB, to process in this way (all in the same folder). Is there a CLI way to run the code on parallel on all of them (so each core will get assigned a subset of the files in the directory)?
--I use linux --ubuntu