Doing all the columns in a single pass is much more efficient than rescanning the entire input file for each column.
awk -F , 'NR==1 { ncols = split($0, cols, /,/); next }
{ for(i=1; i<=ncols; ++i)
if (!seen[i ":" $i])
print $i >>cols[i] ".csv"}' CROWN.csv
If this is going to be part of a bigger task, maybe split the input file into several temporary files with fewer columns than the number of open file handles permitted on your system, rather than fix this script to handle an arbitrary number of columns.
You can inspect this system constant with ulimit -n
; on some systems, you can increase it either by tweaking the system configuration or, in the worst case, by recompiling the kernel. (Your question doesn't identify your platform, but this should be easy enough to google.)
Addendum: I created a quick and dirty timing comparison of these answers at https://ideone.com/dnFj41; I encourage you to fork it and experiment with different shapes of input data. With an input file of 100 columns and (probably) no duplication in the columns -- but only a few hundred rows -- I got the following results:
- 0.001s Baseline test -- simply copy input file to an identical output file
- 0.242s tripleee -- this single-pass AWK script
- 0.561s Sorin -- multiple passes using simple shell script
- 2.154s Mihir -- multiple passes using AWK
Unfortunately, Carmen's answer could not be tested, because I did not have permissions to install Text::CSV_XS
on Ideone.
An earlier version of this answer contained a Python attempt, but I was too lazy to finish debugging it. It's still there in the edit history if you are curious.