I tested these commands on a file with 10 million lines and I hope that you will find them useful.
Extract the header (the first 30 lines of your file) into a separate file, header.txt
:
perl -ne 'print; exit if $. == 30' 1.8TB.txt > header.txt
Now you can edit the file header.txt
in order to add an empty line or two at its end, as a visual separator between it and the rest of the file.
Now copy your huge file from the 5 millionth line and up to the end of the file – into the new file 0.9TB.txt.
Instead of the number 5000000, enter here the number of the line you want to start copying the file from, as you say that you know it:
perl -ne 'print if $. >= 5000000' 1.8TB.txt > 0.9TB.txt
Be patient, it can take a while. You can launch 'top
' command to see what's going on. You can also track the growing file with tail -f 0.9TB.txt
Now merge the header.txt
and 0.9TB.txt
:
perl -ne 'print' header.txt 0.9TB.txt > header_and_0.9TB.txt
Let me know if this solution worked for you.
Edit: The steps 2 and 3 can be combined into one:
perl -ne 'print if $. >= 5000000' 1.8TB.txt >> header.txt
mv header.txt 0.9TB.txt
Edit 26.05.21:
I tested this solution with split
and it was magnitudes faster:
If you dont have perl
, use head
to extract the header:
head -n30 1.8TB.txt > header.txt
split -l 5000030 1.8TB.txt 0.9TB.txt
(Note the file with the extention *.txtab
, created by split
)
cat 0.9TB.txtab >> header.txt
mv header.txt header_and_0.9TB.txt