2

I am trying to use a similar command from awk compare columns from two files, impute values of another column , and looked a various questions that are similar to mine awk search column from one file, if match print columns from both files , How to import fields in other columns corresponding to one common field in two files with `NA` in all unmatched columns , awk compare 2 files, 2 fields different order in the file, print or merge match and non match lines with files that have more fields but I cannot get it to work. I also read from http://theunixshell.blogspot.com/2012/12/i-have-two-files-file-1-contains-3.html to see if it would work but I am still having trouble:

File 1:

xx NC1 12 13 ! pro

xy NC1 15 17 ! pro

yx NC1 18 20 ! pro

yy NC1 22 28 ! pro

File 2

xx ds

xy jt

yy wp

desired output:

xx NC1 12 13 ! pro ds

xy NC1 15 17 ! pro jt

yx NC1 18 20 ! pro NA

yy NC1 22 28 ! pro wp

The code I am using:

awk 'NR==FNR { a[$1]=$6; next }{print $0 "   "  ($2 in a ? a[$2] : "NA")}' file2 file1 

So basically my output gives me a new column that are all "NA" which obviously is not what I am trying to get to.

output:

xx NC1 12 13 ! pro NA

xy NC1 15 17 ! pro NA

yx NC1 18 20 ! pro NA

yy NC1 22 28 ! pro NA
GAD3R
  • 4,317
  • 1
  • 23
  • 34
JuDoe
  • 23
  • 3

1 Answers1

4

You are close.

awk 'NR==FNR {a[$1]=$2;next}{print $0, ($1 in a ? a[$1]:"NA")}' f2 f1

Your problem is, you put the file2 as the first argument, however, you thought it was file1. file2 has no $6 at all.

Kent
  • 189,393
  • 32
  • 233
  • 301
  • It worked, thank you. Do you mind providing me an explanation of what each part does so I can have a better understanding? – JuDoe Jul 04 '18 at 22:23
  • 1
    @JuDoe you wrote the original codes in your question, I suppose that you understand the stuff. which part do you have problem to understand – Kent Jul 05 '18 at 07:35
  • I wrote it barely understanding. This is what I understood and was able to find searching on the web. NR==FNR is used when you use two files. a[$1] is creating an array of the first field ? When it says a[$1]=$2 is it suggesting that the file has two columns? I read that next tells awk not to process any further commands and to read in the next record and start over. But honestly I do not understand what it is trying to do there. Does print $0 function a condition of when there is nothing to show that field? – JuDoe Jul 05 '18 at 13:33
  • 2
    @JuDoe `NR==FNR{a[$1]=$2;next}` takes the first file (`file2`), build a hashtable (key=1st col, value=2nd col), when the file was finish processed, it's the turn of the 2nd file (`file1`), it will only be applied by `{print $0, ($1 in a ? a[$1]:"NA")}` check if col1 already in the hashtable( named `a`), then print, otherwise, print `NA`. – Kent Jul 05 '18 at 13:48