I want to split a file of genomic data with 800,000 columns and 40,000 rows into a series of files with 100 columns each, total size 118GB.
I am currently running the following bash script, multithread 15 times:
infile="$1"
start=$2
end=$3
step=$(($4-1))
for((curr=$start, start=$start, end=$end; curr+step <= end; curr+=step+1)); do
cut -f$curr-$((curr+step)) "$infile" > "${infile}.$curr" -d' '
done
However, judging by current progress of the script, it will take 300 days to complete the split?!
Is there a more efficient way to column wise split a space-delimited file into smaller chunks?