0

I am trying to use awk to skip all lines including a specific pattern /^#CHROM/ and start processing on the line below. The awk does execute but currently returns all lines in the tab-delimited file. Thank you :).

file

##INFO=<ID=ANN,Number=1,Type=Integer,Description="My custom annotation">
##source_20170530.1=vcf-annotate(r953) -d key=INFO,ID=ANN,Number=1,Type=Integer,Description=My custom annotation -c CHROM,FROM,TO,INFO/ANN
##INFO=<ID=,Number=A,Type=Float,Description="Variant quality">
#CHROM  POS ID  REF ALT
chr1    948846  .   T   TA  NA  NA
chr2    948852  .   T   TA  NA  NA
chr3    948888  .   T   TA  NA  NA

awk

awk -F'\t' -v OFS="\t" 'NR>/^#CHROM/ {print $1,$2,$3,$4,$5,"ID=1"$6,"ID=2"$7}' file

desiered output

chr1    948846  .   T   TA  ID1=NA  ID2=NA
chr2    948852  .   T   TA  ID1=NA  ID2=NA
chr3    948888  .   T   TA  ID1=NA  ID2=NA
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
justaguy
  • 2,908
  • 4
  • 17
  • 36
  • 3
    try `/^#CHROM/{f=1;next} f{print ...}` or `f{print ...} /^#CHROM/{f=1}` – Sundeep Jun 14 '17 at 15:43
  • 1
    `/^#CHROM/` is a regexp that's equivalent to `$0~/^#CHROM/`. Do you think the result of that comparison is a line number or a boolean true/false? If a boolean, why are you comparing it to the current line number (NR)? If a line number, what do you think the result of testing to see if the current line number is greater than the current line number would be? – Ed Morton Jun 14 '17 at 17:28

2 Answers2

2

Use the following awk approach:

awk -v OFS="\t" '/^#CHROM/{ r=NR }r && NR>r{ $6="ID=1"$6; $7="ID=2"$7; print }' file

The output:

chr1    948846  .   T   TA  ID=1NA  ID=2NA
chr2    948852  .   T   TA  ID=1NA  ID=2NA
chr3    948888  .   T   TA  ID=1NA  ID=2NA

  • /^#CHROM/{ r=NR } - capturing the pattern line number

The alternative approach would look as below:

awk -v OFS="\t" '/^#CHROM/{ f=1; next }f{ $6="ID=1"$6; $7="ID=2"$7; print }' file
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
2
awk 'BEGIN{FS=OFS="\t"} f{print $1,$2,$3,$4,$5,"ID1="$6,"ID2="$7} /^#CHROM/{f=1}' file

See https://stackoverflow.com/a/17914105/1745001 for details on this and other awk search idioms. Yours is a variant of "b" on that page.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185