1

I need to create a file from two input files using linux command

input 1:

21 33210001 rs60180678 G T 100 PASS AVGPOST=1.0000;RSQ=0.9885;THETA=0.0002;AA=G;AN=2184;VT=SNP;LDAF=0.0019;SNPSOURCE=LOWCOV;AC=4;ERATE=0.0003;AF=0.0018;AFR_AF=0.01 GT:DS:GL

input 2:

21 33210001 . G T . . ;AA=0.0163934;AFE=0;ASNE=0;EUN=0;AFW=0.0113636;MED=0;LAT=0;VT=SNP;AF=0.0018

expected output:

21 33210001 rs60180678 G T . . ;AA=0.0163934;AFE=0;ASNE=0;EUN=0;AFW=0.0113636;MED=0;LAT=0;VT=SNP;AF=0.0018

Each coloumn is separated by tab space.

Creating the output based on 1st,2nd,4th and 5th column match .

Each column of out file is separated by tab space.

Steve
  • 51,466
  • 13
  • 89
  • 103
AKR
  • 359
  • 1
  • 5
  • 11

2 Answers2

2

Here's one way with awk:

awk 'BEGIN { FS=OFS="\t" } FNR==NR { a[$1,$2,$4,$5]=$3; next } ($1,$2,$4,$5) in a { $3=a[$1,$2,$4,$5] }1' file1 file2

Results:

21 33210001 rs60180678 G T . . ;AA=0.0163934;AFE=0;ASNE=0;EUN=0;AFW=0.0113636;MED=0;LAT=0;VT=SNP;AF=0.0018
Steve
  • 51,466
  • 13
  • 89
  • 103
  • I forgot to add that both input files are .gz file(Commpressed) .**file1.vcf.gz** and **file2.vcf.gz** and output will be **file3.vcf.gz** – AKR Nov 28 '12 at 06:17
  • 1
    @user1782877: Simply change `file1 file2` to: ` `<(gzip -dc input1.vcf.gz) <(gzip -dc input2.vcf.gz) | gzip > output.vcf.gz` – Steve Nov 28 '12 at 06:41
0

Another solution:

awk 'BEGIN{FS=OFS="\t"}{getline a < "file2"; split(a,b,"\t");print $1,$2,$3,$4,$5,b[6],b[7],b[8]}' file1
Tedee12345
  • 1,182
  • 4
  • 16
  • 26