I have a file.gz (not a .tar.gz!) or file.zip file. It contains one file (20GB-sized text file with tens of millions of lines) named 1.txt
.
- Without saving
1.txt
to disk as a whole (this requirement is the same as in my previous question), I want to extract all its lines that match some regular expression and don't match another regex. - The resulting
.txt
files must not exceed a predefined limit, say, one million lines.
That is, if there are 3.5M lines in 1.txt
that match those conditions, I want to get 4 output files: part1.txt, part2.txt, part3.txt, part4.txt (the latter will contain 500K lines), that's all.
I tried to make use of something like
gzip -c path/to/test/file.gz | grep -P --regexp='my regex' | split -l1000000
But the above code doesn't work. Maybe Bash can do it, as in my previous question, but I don't know how.