0

Update: I figured out the reason for the extraneous newline. I created file1 and file2 on a Windows machine. Windows adds <cr><newline> to the end of each line. So, for example, the first record in file1 is not this:

Bill <tab> 25 <newline>

Instead, it is this:

Bill <tab> 25 <cr><newline>

So when I set a[Bill] to $2 I am actually setting it to $2<cr>.

I used a hex editor and removed all of the <cr> symbols in file1 and file2. Now the AWK program works as desired.

I have seen the SO posts on using AWK to do a natural join of two files. I took one of the solutions and am trying to get it to work. Alas, I have been unsuccessful. I am hoping you can tell me what I am doing wrong.

Note: I appreciate other solutions, but what I really want is to understand why my AWK program doesn't work (i.e., why/how an extraneous newline is being introduced).

I want to do a join of these two files:

file1 (name, tab, age):

Bill    25
John    24
Mary    21

file2 (name, tab, marital-status)

Bill    divorced
Glenn   married
John    married
Mary    single

When joined, I expect to see this (name, tab, age, tab, marital-status):

Bill    25  divorced
John    24  married
Mary    21  single

Notice that file2 has a person named Glenn, but file1 doesn't. No record in file1 joins to it.

My AWK program almost produces that result. But, for reasons I don't understand, the marital-status value is on the next line:

Bill    25
divorced
John    24
married
Mary    21
single

Here is my AWK program:

awk 'BEGIN { OFS = '\t' }
     NR == FNR { a[$1] = ($1 in a? a[$1] OFS : "")$2; next }
     $1 in a { $0 = $0 OFS a[$1]; delete a[$1]; print }' file2 file1  > joined_file1_file2
Roger Costello
  • 3,007
  • 1
  • 22
  • 43
  • Your ternary puts a value in `a` if one is there and puts a blank (plus the status) there if it's empty. I would use an associative array as in [this answer](https://stackoverflow.com/a/73504491/26428) – Dennis Williamson Aug 26 '22 at 19:27
  • 1
    Also if you look at my answer posted well before tour latest edit, I included sub call just for that as I suspected all along presence of DOS line ending – anubhava Aug 27 '22 at 04:10
  • See [why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it). – Ed Morton Aug 27 '22 at 15:30
  • 1
    You should have included in your example a name in file1 that doesn't exist in file2 so we can see how you want that treated. – Ed Morton Aug 27 '22 at 15:31

1 Answers1

0

You may try this awk solution:

awk 'BEGIN {FS=OFS="\t"} {sub(/\r$/, "")}
FNR == NR {m[$1]=$2; next} {print $0, m[$1]}' file2 file1

Bill    25  divorced
John    24  married
Mary    21  single

Here:

  • Using sub(/\r$/, "") to remove any DOS line ending
  • If $1 doesn't exist in mapping m then m[$1] will be an empty string so we can simplify awk processing
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Although the OP didn't include this possibility, it should be noted that this will print lines from the age file that don't exist in the status file (which could be a desired result). – Dennis Williamson Aug 26 '22 at 19:29