Here's a bash script that does that in a single pass:
Columns=$(head -n 1 train.csv | sed "s/,/\n/g" | wc -l)
mkdir cols
tail -n +2 train.csv | \
while IFS=, read -ra row; do
for i in `seq 1 $Columns`; do
echo "${row[$(($i-1))]}" >> cols/col_$i.txt
done
done
The disadvantage of this script is that it will open and close the column files millions of times. The following perl script avoids that issue by keeping all files open:
#!/usr/bin/perl
use strict;
use warnings;
my @handles;
open my $fh,'<','train.csv' or die;
<$fh>; #skip the header
while (<$fh>) {
chomp;
my @values=split /,/;
for (my $i=0; $i<@values; $i++) {
if (!defined $handles[$i]) {
open $handles[$i],'>','cols/col_'.($i+1).'.txt' or die;
}
print {$handles[$i]} "$values[$i]\n";
}
}
close $fh;
close $_ for @handles;
Since you have 5000 columns and this scripts keeps 5001 files open, you would need to increase the number of open file descriptors that your system allows you to have.