1

I have a text file, named "hosts.tbl":

BILL RED
VAL YELLOW
STEVE YELLOW
TOM ORANGE
BILLY RED
VALERIE BLUE

I have a second file, named "details.tbl" which has each name above, multiple times(among various other details on each line). I need to count how many times each name appears within "details.tbl", and end up with something like this:

BILL RED 8
VAL YELLOW 16
STEVE YELLOW 9
TOM ORANGE 1
BILLY RED 2
VALERIE BLUE 30

As you can see, a normal "grep" for 'BILL' will give me both "BILL" and "BILLY". Same for "VAL" and "VALERIE". However, within the "details.tbl" file, each occurrence of each name is followed by "-C". For example:

STEVE-C
STEVE-C
BILL-C
BILLY-C

I have tried:

awk {'print $1 " " $2 " "'} hosts.tbl|grep -c $1"-C" details.tbl
awk {'print $1 " " $2 " "'grep -c $1"-C" details.tbl} hosts.tbl

...and various other permutations of similar syntax, above...all dismal failures. Clearly, I am a neophyte when it comes to shell commands in particular, and UNIX in general. What am I missing, here? I cannot find anything in the man pages about how to concatenate search criteria within grep, or how to pass only specific fields from awk along to grep.

Assuming the applicable portion of the details.tbl file looks like this:

BILL-C
VAL-C
STEVE-C
TOM-C
BILLY-C
VALERIE-C
BILL-C
VAL-C
STEVE-C
TOM-C
BILLY-C
VALERIE-C

The output should look like this:

BILL RED 2
VAL YELLOW 2
STEVE YELLOW 2
TOM ORANGE 2
BILLY RED 2
VALERIE BLUE 2
  • 1
    Its not clear if you think return BILL and BILLY (for example) is the what you need. Given your `-C` file, please **edit your Q** to show your expected output for one of the 2 entry items. (While not necessary in this case, it will be a good idea to continue to flag your AIX Qs as such, because that system is very different from Linux, and even other vendors old-style Unixens). Good luck. – shellter Jun 09 '16 at 22:00
  • Explained differently (if I understand well): I have a file `hosts.tbl` with first and last name. Another file `details.tbl` only has the first names, all followed by `-C`. All the first names in `hosts.tbl` are unique. I want to count all first names and present them their last name. – Walter A Jun 12 '16 at 13:00

2 Answers2

1

cat hosts.tbl

BILL RED
VAL YELLOW
STEVE YELLOW
TOM ORANGE
BILLY RED
VALERIE BLUE

cat details.tbl

BILL RED
VAL YELLOW
STEVE YELLOW
TOM ORANGE
BILLY RED
VALERIE BLUE
BILL RED
VAL YELLOW
STEVE YELLOW
TOM ORANGE
BILLY RED
VALERIE BLUE
BILL RED
VAL YELLOW
STEVE YELLOW
TOM ORANGE

with awk command, we get the name from 1st file and store in array a, from 2nd file we match if the name is present and if it is, the count is incremented

awk 'FILENAME == ARGV[1]{a[$0]=0;next} FILENAME == ARGV[2] && $0 in a{a[$0]+=1} END
{for(i in a){print i,a[i]}} ' hosts.tbl  details.tbl

Output

VALERIE BLUE 2
BILLY RED 2
BILL RED 3
VAL YELLOW 3
TOM ORANGE 3
STEVE YELLOW 3
Chet
  • 1,205
  • 1
  • 12
  • 20
1

When you ignore https://unix.stackexchange.com/a/169765/57293 you can make a solution like

while read -r name lastname ; do
   printf "%s %s %s\n" ${name} ${lastname} $(grep -c "${name}-C" details.tbl)
done < hosts.tbl

When you use awk, you should first process details.tbl and count the lines. Processing 2 files differently in one awk-script is explained at What is "NR==FNR" in awk?.
You want to ignore the -C, you can preprocess the inputfile with cut like this:

awk 'NR==FNR {a[$0]++;next} {
       for(i in a) {
         if ($1==i) {
           print $0, a[i]
         }
       }
     }' <(cut -d"-" -f1<details.tbl) hosts.tbl

awk is smart, the preprocessing with cut is not needed:

awk -F '[ -]' 'NR==FNR {a[$1]++; next} {
       for(i in a) {
         if ($1==i) {
           print $0, a[i]
         }
       }
     }' details.tbl hosts.tbl
Community
  • 1
  • 1
Walter A
  • 19,067
  • 2
  • 23
  • 43