4

I need to split a file with 250k based on the size (preferable) or number of columns to several (~5) chunks. I am aware of split command for row-wise splitting, but don't know if there is any similar function to split column-wise. The number of columns in my file are not even, so the chunks cannot have equal number of columns.

Input:

AA BB CC DD EE FF GG HH II JJ KK LL MM
NN OO PP QQ RR SS TT UU VV WW XX YY ZZ

Desired output:

File1

AA BB CC DD        
NN OO PP QQ

File2

EE FF GG HH
RR SS TT UU

File3

II JJ KK LL MM
VV WW XX YY ZZ 
user2162153
  • 267
  • 1
  • 4
  • 11

3 Answers3

4

Using awk, you can adjust n to the number you expect.

awk '{for (i=1;i<=NF;i++)
         printf (i%n==0||i==NF)?$i RS:$i FS > "File" int((i-1)/n+1) ".txt"
      }' n=5 file
BMW
  • 42,880
  • 12
  • 99
  • 116
2

Use cut. It's part of GNU coreutils.

Assuming your input file columns are separated by a space:

cut -d " " -f1-4 /path/to/input/file > file1

cut -d " " -f5-8 /path/to/input/file > file2

...

See the man page man cut for more information.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Ken
  • 7,847
  • 1
  • 21
  • 20
2

I would use awk for this. Not sure if you would like to have 5 columns per file as you mentioned that you have 250k columns that would make creating 50k files, but here is something to get you started:

awk '{
  y=1
  for(i=1;i<NF;i++) { 
    if(i%5==0) {
      print $i > "text"y".txt"
      y+=1
      continue 
    }
  printf "%s ",$i >"text"y".txt"
  } 
print $NF > "text"y".txt"}' file

Test:

$ cat file
AA BB CC DD EE FF GG HH II JJ KK LL MM
NN OO PP QQ RR SS TT UU VV WW XX YY ZZ

$ awk '{
  y=1
  for(i=1;i<NF;i++) { 
    if(i%5==0) {
      print $i > "text"y".txt"
      y+=1
      continue 
    }
  printf "%s ",$i >"text"y".txt"
  } 
print $NF > "text"y".txt"}' file

$ head text*
==> text1.txt <==
AA BB CC DD EE
NN OO PP QQ RR

==> text2.txt <==
FF GG HH II JJ
SS TT UU VV WW

==> text3.txt <==
KK LL MM
XX YY ZZ
jaypal singh
  • 74,723
  • 23
  • 102
  • 147