-1
#!/bin/bash
awk 'FNR > 1 && NR==FNR {
  f_name = tolower($1) # first name
  l_name = tolower($2) # last name
  full_name = f_name":"l_name # full name used as array index
  address_city_county_state_zip = $4","$5","$6","$7","$8
  address[full_name] = address_city_county_state_zip
  next
  }
  {
    first_n = tolower($1) # first name
    last_n = tolower($2) # last name
    full_n = first_n":"last_n # full name used as array index
    printf ("%-10s %-10s %-55s %06.2f %-10s\n", $1, $2, address[full_n], $4, FILENAME)          
  }' $1 $2

./ch10_challenge.awk addresses.txt nameemailavg.tab

Input file nameemailavg.tab:

printf 'Art\tVenere\tart@venere.org\t256.62394383\nLenna\tPaprocki\tlpaprocki@hotmail.com\t259.688783099\nDonette\tFoller\tdonette.foller@cox.net\t282.32979844\n' > nameemailavg.tab

Input file addresses.txt:

printf 'first_name\tlast_name\tcompany_name\taddress\tcity\tcounty\tstate\tzip\tphone1\tphone2\temail\tweb\nAbel\tMaclead\tRangoni Of Florence\t37275 St  Rt 17m M\tMiddle Island\tSuffolk\tNY\t11953\t631-335-3414\t631-677-3675\tamaclead@gmail.com\thttp://www.rangoniofflorence.com\nArt\tVenere\tChemel, James L Cpa\t8 W Cerritos Ave #54\tBridgeport\tGloucester\tNJ\t08014\t856-636-8749\t856-264-4130\tart@venere.org\thttp://www.chemeljameslcpa.com\nDonette\tFoller\tPrinting Dimensions\t34 Center St\tHamilton\tButler\tOH\t45011\t513-570-1893\t513-549-4561\tdonette.foller@cox.net\thttp://www.printingdimensions.com\nSimona\tMorasca\tChapman, Ross E Esq\t3 Mcauley Dr\tAshland\tAshland\tOH\t44805\t419-503-2484\t419-800-6759\tsimona@morasca.com\thttp://www.chapmanrosseesq.com\nKiley\tCaldarera\tFeiner Bros\t25 E 75th St #69\tLos Angeles\tLos Angeles\tCA\t90034\t310-498-5651\t310-254-3084\tkiley.caldarera@aol.com\thttp://www.feinerbros.com\n' > addresses.txt
$ cat nameemailavg.tab
Art     Venere  art@venere.org  256.62394383
Lenna   Paprocki        lpaprocki@hotmail.com   259.688783099
Donette Foller  donette.foller@cox.net  282.32979844
$ cat addresses.txt
first_name      last_name       company_name    address city    county  state   zip     phone1  phone2  email   web
Abel    Maclead Rangoni Of Florence     37275 St  Rt 17m M      Middle Island   Suffolk NY      11953   631-335-3414    631-677-3675       amaclead@gmail.com      http://www.rangoniofflorence.com
Art     Venere  Chemel, James L Cpa     8 W Cerritos Ave #54    Bridgeport      Gloucester      NJ      08014   856-636-8749    856-264-4130       art@venere.org  http://www.chemeljameslcpa.com
Donette Foller  Printing Dimensions     34 Center St    Hamilton        Butler  OH      45011   513-570-1893    513-549-4561    donette.foller@cox.net     http://www.printingdimensions.com
Simona  Morasca Chapman, Ross E Esq     3 Mcauley Dr    Ashland Ashland OH      44805   419-503-2484    419-800-6759    simona@morasca.com http://www.chapmanrosseesq.com
Kiley   Caldarera       Feiner Bros     25 E 75th St #69        Los Angeles     Los Angeles     CA      90034   310-498-5651    310-254-3084       kiley.caldarera@aol.com http://www.feinerbros.com

Expected output:

| Art        | Venere     | James,L,Cpa,8,W                                         | 256.62 | nameemailavg.tab |
| ---- | ---- | ---- | --- | --- |
| Lenna      | Paprocki                                                           | 259.69 | nameemailavg.tab |
| Donette    | Foller     | Dimensions,34,Center,St,Hamilton                        | 282.33 | nameemailavg.tab |

The undesirable output is the first row with first_name, last_name, 000.00, last_name:

| first_name | last_name                                                          | 000.00 | addresses.txt |
| ---- | ---- | ---- | ---- | ---- |
| Art        | Venere     | James,L,Cpa,8,W                                         | 256.62 | nameemailavg.tab |
| Lenna      | Paprocki                                                           | 259.69 | nameemailavg.tab |
| Donette    | Foller     | Dimensions,34,Center,St,Hamilton                        | 282.33 | nameemailavg.tab |

Questions:

  1. Why does my code print the first line from the file "addresses.txt"?
  2. an awk program consists of a pattern { action }. Since lines 11-14 do not have a pattern and only an action, they are not executed because the previous block has next?
  3. I updated the code block (10-15) to include the pattern FS="\t" so I can get to pattern { action } format. Unfortunately that had no impact.
  4. When I reversed the call to awk like so, ./ch10_challenge.awk nameemailavg.tab addresses.txt, the output was messed up (expected), but the first line was from the file "nameemailavg.tab". In other words, whatever files is the first file, the output has that file as a 'header'.

Here is the revised code from the second block (line 10ff):

  FS = "\t" {
    first_n = tolower($1) # first name
    last_n = tolower($2) # last name
    full_n = first_n":"last_n # full name used as array index
    printf ("%-10s %-10s %-55s %06.2f %-10s\n", $1, $2, address[full_n], $4, FILENAME)          
  }' $1 $2
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Please [edit] your question to show concise, testable textual sample input and expected output plus the undesirable output you're currently getting – Ed Morton Jun 10 '22 at 11:40
  • 3
    If FNR == 1 and NR == FNR, then the condition for the first action fails, so the 2nd action is executed. Add `NR == 1 {next}` if you want to skip it – glenn jackman Jun 10 '22 at 13:14
  • @glennjackman, thank you. If you don't mind my asking. What drew your focus to that section of the code (experience, seen too much code, etc..)? – Backspace Mild Jun 11 '22 at 08:02
  • @glennjackman. What I don't understand is, why don't I see the same issue on https://www.baeldung.com/linux/awk-multiple-input-files#3-join-and-calculate. I ran the example and without the NR == 1, the first file header does not print in the output. – Backspace Mild Jun 11 '22 at 08:26
  • Both the condition and the action are optional. If the condition is missing, the action is taken unconditionally. If the action is missing, the default action is to print the current input record. Hence, the shorthand `1` idiom to say "print the current line" (the condition is true by definition, and the implicit action is `{ print $0 }`). – tripleee Jun 11 '22 at 08:46
  • As an aside, your script is not using any Bash features, so you could use `#!/bin/sh` in the shebang line for improved portability and, on some platforms, a tiny bit faster execution. In both `sh` and Bash, the lack of quoting around `$1` and `$2` is a bug which prevents your script from working on filenames which contain whitespace or some other shell metacharacters; see also [When to wrap quotes around a shell variable?](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee Jun 11 '22 at 08:50
  • 1
    Accessing the link requires us to register with LinkedIn, which some of us are absolutely unwilling to do. – tripleee Jun 11 '22 at 08:53
  • @tripleee, Thank you and I really appreciate the help with the questions and constructive feedback. When you have time, can you please tell me what you did to fix the tables? The LinkedIn link was there just to show folks that it is not a real challenge and if they choose to verify, they can. Thanks again. – Backspace Mild Jun 11 '22 at 09:12
  • The [revision history](https://stackoverflow.com/posts/72573104/revisions) shows exactly what I did. There is a markup help sidebar with links to more information when you are editing. – tripleee Jun 11 '22 at 09:14
  • As another aside, probably prefer `printf` over the non-portable `echo -e`; it would also improve your ability to format the data legibly. – tripleee Jun 11 '22 at 09:56
  • I removed all the stuff that was cluttering up your question and added the missing parts (contents of the 2 input files). It's still not clear why you show actual and expected output with `|`s in them when your code doesn't print any `|`s, nor is it clear why you'd want the first line of output separated from the rest by a `| ---- | ---- | ---- | ---- | ---- |` line. Please [edit] your question to show real actual+expected output given your posted sample input. – Ed Morton Jun 11 '22 at 11:57
  • @BackspaceMild: (1) yes I suppose that's experience. I saw `FNR > 1 && NR==FNR` and I thought "that's pretty specific. What If ..." (2) you don't see the same on the linked site because the 2nd action there has the `FNR > 1` condition on it and you code does not. – glenn jackman Jun 11 '22 at 21:48

2 Answers2

1

As Glenn said - add

NR == 1 {next}

at the top. FNR > 1 fails to match on the first record, so it doesn't hit the next in that section and is getting to the next block.

Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
1

Questions with answers:

  1. Why does my code print the first line from the file "addresses.txt"?

Because you put the FNR > 1 test in the wrong place and so the next is only executed after the first line of the first file.

  1. an awk program consists of a pattern { action }. Since lines 11-14 do not have a pattern and only an action, they are not executed because the previous block has next?

"pattern" is an ambiguous, vague term - it's a condition, not a pattern, all of the existing documentation that says otherwise is misleading at best. So, awk scripts are a series of condition-action statements. Your code has a condition, the default one of "true", and so they are being executed - if they weren't you'd be getting no output since that second block is where your printf statement is.

  1. I updated the code block (10-15) to include the pattern FS="\t" so I can get to pattern { action } format. Unfortunately that had no impact.

Doing that makes no sense. You added a condition that says "set FS to \t and if the result of that is that FS is not null, which it's not since it's \t, then execute the associated action". I don't know what impact you thought that might have.

  1. When I reversed the call to awk like so, ./ch10_challenge.awk nameemailavg.tab addresses.txt, the output was messed up (expected), but the first line was from the file "nameemailavg.tab". In other words, whatever files is the first file, the output has that file as a 'header'.

Right. Whatever file you list first in the arguments is read first. Again, not sure what impact you thought swapping the input files might have if not that.

This is what your script is trying to do:

$ cat ./ch10_challenge
#!/usr/bin/env bash

awk '
    {
        first_n = tolower($1)           # first name
        last_n  = tolower($2)           # last name
        full_n  = first_n ":" last_n    # full name used as array index
    }
    NR==FNR {
        if ( FNR > 1 ) {
            address_city_county_state_zip = $4","$5","$6","$7","$8
            address[full_n] = address_city_county_state_zip
        }
        next
    }
    {
        printf "%-10s %-10s %-55s %06.2f %-10s\n", $1, $2, address[full_n], $4, FILENAME
    }
' "$1" "$2"

$ ./ch10_challenge addresses.txt nameemailavg.tab
Art        Venere     James,L,Cpa,8,W                                         256.62 nameemailavg.tab
Lenna      Paprocki                                                           259.69 nameemailavg.tab
Donette    Foller     Dimensions,34,Center,St,Hamilton                        282.33 nameemailavg.tab

Note that your script is a shell script, not an awk script, so adding a .awk suffix would not be appropriate.

Get the book Effective AWK Programming, 5th Edition, by Arnold Robbins to learn AWK.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185