Update: I figured out the reason for the extraneous newline. I created file1
and file2
on a Windows machine. Windows adds <cr><newline>
to the end of each line. So, for example, the first record in file1 is not this:
Bill <tab> 25 <newline>
Instead, it is this:
Bill <tab> 25 <cr><newline>
So when I set a[Bill]
to $2
I am actually setting it to $2<cr>
.
I used a hex editor and removed all of the <cr>
symbols in file1
and file2
. Now the AWK program works as desired.
I have seen the SO posts on using AWK to do a natural join of two files. I took one of the solutions and am trying to get it to work. Alas, I have been unsuccessful. I am hoping you can tell me what I am doing wrong.
Note: I appreciate other solutions, but what I really want is to understand why my AWK program doesn't work (i.e., why/how an extraneous newline is being introduced).
I want to do a join of these two files:
file1 (name, tab, age):
Bill 25
John 24
Mary 21
file2 (name, tab, marital-status)
Bill divorced
Glenn married
John married
Mary single
When joined, I expect to see this (name, tab, age, tab, marital-status):
Bill 25 divorced
John 24 married
Mary 21 single
Notice that file2 has a person named Glenn, but file1 doesn't. No record in file1 joins to it.
My AWK program almost produces that result. But, for reasons I don't understand, the marital-status value is on the next line:
Bill 25
divorced
John 24
married
Mary 21
single
Here is my AWK program:
awk 'BEGIN { OFS = '\t' }
NR == FNR { a[$1] = ($1 in a? a[$1] OFS : "")$2; next }
$1 in a { $0 = $0 OFS a[$1]; delete a[$1]; print }' file2 file1 > joined_file1_file2