0

I have two files, one large text file with multiple columns, in a similar format as the following:

Col1 Col2 A B C D G H I L N Q S ...
1    0    1 0 1 0 1 0 1 0 1 0 1 ...
0    2    1 0 0 0 0 0 2 1 2 3 4 ...
...

and a text file with columns to match in

Col1
Col2
Col3 
B
D
F
...

(note that some lines in this file may not match the column titles in the large tsv file)

I want to cut the large file based on column titles which match those given in the file (ignoring those in the column title or the text file which do not match, so the output file column titles start Col1 Col2 B D ...)

Is there an easy way of doing this rather than looping over each line in the second text file and building a file using paste?

This is on Mac OS X, so using ksh, although pure bash could be used instead.

user36196
  • 133
  • 11

1 Answers1

0

Install miller from homebrew (brew install miller) then then

mlr --ipprint --oxtab cat large.txt \
| grep -f <(sed 's/^/^/; s/\./\\./g' cols.txt; echo '^$') \
| mlr --ixtab --opprint cat
Col1 Col2 B D ...
1    0    0 0 ...
0    2    0 0 ...

Or, with plain awk

awk '
  NR==FNR {col[$1]; next}
  FNR == 1 {for (i=1; i<=NF; i++) if ($i in col) wanted[i] = 1}
  {
    for (i=1; i<=NF; i++) if (wanted[i]) printf "%s%s", $i, OFS
    print ""
  }
' cols.txt large.txt
Col1 Col2 B D ...
1 0 0 0 ...
0 2 0 0 ...
glenn jackman
  • 238,783
  • 38
  • 220
  • 352