1

I have a file containing a big table, something like that :

Header1    Header2    Header3 ... Header8031
Value1     Value2     Value3 .... Value8031
.
.
Value1     Value2     Value3 ...  Value8031

In another file I have a list with some headers of the previous table.

Header1
Header3000
Header5
Header200
Header10

I want to extract the information in the table only for the headers in the list. In other words, getting the columns that match with the headers on the list. [matching the list with the columns id on the tables]

Output

Header1   Header3000 Header5 Header200   Header10
Value1    Value3000  Value5  Value200    Value10
Value1    Value3000  Value5  Value200    Value10

I tried some examples with awk (AWK extract columns from file based on header selected from 2nd file), but I'm not able to get my desired output.

Community
  • 1
  • 1
user2245731
  • 449
  • 3
  • 7
  • 16
  • Where are the columns corresponding to Header3000 and Header200 in the output? Is there a reason that they should be skipped? The linked question seems like the perfect match, could you be more specific about what problems you're having with the solutions provided there? – Tom Fenech Oct 28 '14 at 10:31
  • Not skipped, it was an error. From the linked solution I only get the first column of my desired output. – user2245731 Oct 28 '14 at 10:39
  • 1
    It would be useful to include your attempt based on the answer there and explain what the problem is in your question as well. – Tom Fenech Oct 28 '14 at 10:39

2 Answers2

3

I would use a little script like this:

FNR==NR {a[$1]; next}
FNR==1 { for (i=1;i<=NF;i++) if ($i in a) b[i] }
{ for (i=1; i<=NF; i++) if (i in b) printf "%s%s", $i, FS
print ""
}

Explanation

  • First read the second file and store the name of the columns.
  • Then read the first file.
    • On the first line, store the column number of the columns we want to print.
    • From then on, print those desired column numbers.

Test

$ cat f1
Header1    Header2    Header3 Header8031
Value1     Value2     Value3 Value8031
Value1     Value2     Value3 Value8031
$ cat f2
Header1
Header3000
Header5
Header200
Header10
Header3

Test:

$ awk -f a.awk f2 f1
Header1 Header3 
Value1 Value3 
Value1 Value3 
fedorqui
  • 275,237
  • 103
  • 548
  • 598
1

this awk line would work for you:

awk 'NR==FNR{a[$0]=7;next}FNR==1{for(i=1;i<=NF;i++)if(a[$i])c[++x]=i}
    {for(i=1;i<=x;i++)printf "%s%s", $(c[i]), (i==x?RS:FS)}' headerFile dataFile

test with example:

kent$  head col f
==> col <==
Header1
Header3
Header5

==> f <==
Header1   Header2   Header3   Header4   Header5  Header10
Value1    Value2    Value3    Value4    VAlue5   Value10
Value1    Value2    Value3    Value4    Value5   Value10

kent$  awk 'NR==FNR{a[$0]=7;next}FNR==1{for(i=1;i<=NF;i++)if(a[$i])c[++x]=i}
        {for(i=1;i<=x;i++)printf "%s%s", $(c[i]), (i==x?RS:FS)}' col f
Header1 Header3 Header5
Value1 Value3 VAlue5
Value1 Value3 Value5
Kent
  • 189,393
  • 32
  • 233
  • 301