Extracting columns from a file

Question

I have a file containing a big table, something like that :

Header1    Header2    Header3 ... Header8031
Value1     Value2     Value3 .... Value8031
.
.
Value1     Value2     Value3 ...  Value8031

In another file I have a list with some headers of the previous table.

Header1
Header3000
Header5
Header200
Header10

I want to extract the information in the table only for the headers in the list. In other words, getting the columns that match with the headers on the list. [matching the list with the columns id on the tables]

Output

Header1   Header3000 Header5 Header200   Header10
Value1    Value3000  Value5  Value200    Value10
Value1    Value3000  Value5  Value200    Value10

I tried some examples with awk (AWK extract columns from file based on header selected from 2nd file), but I'm not able to get my desired output.

Where are the columns corresponding to Header3000 and Header200 in the output? Is there a reason that they should be skipped? The linked question seems like the perfect match, could you be more specific about what problems you're having with the solutions provided there? — Tom Fenech, Oct 28 '14 at 10:31
Not skipped, it was an error. From the linked solution I only get the first column of my desired output. — user2245731, Oct 28 '14 at 10:39
It would be useful to include your attempt based on the answer there and explain what the problem is in your question as well. — Tom Fenech, Oct 28 '14 at 10:39

score 3 · Answer 1 · answered Oct 28 '14 at 10:51

I would use a little script like this:

FNR==NR {a[$1]; next}
FNR==1 { for (i=1;i<=NF;i++) if ($i in a) b[i] }
{ for (i=1; i<=NF; i++) if (i in b) printf "%s%s", $i, FS
print ""
}

Explanation

First read the second file and store the name of the columns.
Then read the first file.
- On the first line, store the column number of the columns we want to print.
- From then on, print those desired column numbers.

Test

$ cat f1
Header1    Header2    Header3 Header8031
Value1     Value2     Value3 Value8031
Value1     Value2     Value3 Value8031
$ cat f2
Header1
Header3000
Header5
Header200
Header10
Header3

Test:

$ awk -f a.awk f2 f1
Header1 Header3 
Value1 Value3 
Value1 Value3

score 1 · Accepted Answer · answered Oct 28 '14 at 10:47

this awk line would work for you:

awk 'NR==FNR{a[$0]=7;next}FNR==1{for(i=1;i<=NF;i++)if(a[$i])c[++x]=i}
    {for(i=1;i<=x;i++)printf "%s%s", $(c[i]), (i==x?RS:FS)}' headerFile dataFile

test with example:

kent$  head col f
==> col <==
Header1
Header3
Header5

==> f <==
Header1   Header2   Header3   Header4   Header5  Header10
Value1    Value2    Value3    Value4    VAlue5   Value10
Value1    Value2    Value3    Value4    Value5   Value10

kent$  awk 'NR==FNR{a[$0]=7;next}FNR==1{for(i=1;i<=NF;i++)if(a[$i])c[++x]=i}
        {for(i=1;i<=x;i++)printf "%s%s", $(c[i]), (i==x?RS:FS)}' col f
Header1 Header3 Header5
Value1 Value3 VAlue5
Value1 Value3 Value5

@user2245731 same as `a[$0]=1`, later I checked `if(a[$i])` important is not zero. — Kent, Oct 28 '14 at 11:06

Extracting columns from a file

2 Answers2

Explanation

Test

Linked