Using grep and awk together

Question

I have a file (A.txt) with 4 columns of numbers and another file with 3 columns of numbers (B.txt). I need to solve the following problems:

Find all lines in A.txt whose 3rd column has a number that appears any where in the 3rd column of B.txt.
Assume that I have many files like A.txt in a directory and I need to run this for every file in that directory.

How do I do this?

A `while` loop together with `awk` should suffice. What have you tried? Do you have any sample input together with desired output? — fedorqui, Apr 04 '14 at 14:27
This (1) sounds like a good job for `join`... along with a `for` loop to iterate through the individual files (2). — twalberg, Apr 04 '14 at 14:41

score 39 · Answer 1 · edited May 23 '17 at 11:53

You should never see someone using grep and awk together because whatever grep can do, you can also do in awk:

Grep and Awk

grep "foo" file.txt | awk '{print $1}'

Using Only Awk:

awk '/foo/ {print $1}' file.txt

I had to get that off my chest. Now to your problem...

Awk is a programming language that assumes a single loop through all the lines in a set of files. And, you don't want to do this. Instead, you want to treat B.txt as a special file and loop though your other files. That normally calls for something like Python or Perl. (Older versions of BASH didn't handle hashed key arrays, so these versions of BASH won't work.) However, slitvinov looks like he found an answer.

Here's a Perl solution anyway:

use strict;
use warnings;
use feature qw(say);
use autodie;

my $b_file = shift;
open my $b_fh, "<", $b_file;

#
# This tracks the values in "B"
#
my %valid_lines;
while ( my $line = <$b_file> ) {
    chomp $line;
    my @array = split /\s+/, $line;
    $valid_lines{$array[2]} = 1;   #Third column
}
close $b_file;

#
# This handles the rest of the files
#
while ( my $line = <> ) {  # The rest of the files
   chomp $line;
   my @array = split /\s+/, $line;
   next unless exists $valid_lines{$array[2]};  # Next unless field #3 was in b.txt too
   say $line;
}

Re: `You should never see someone using grep and awk together...` I've got a series of `syslog` files in `/var/log` (some compressed). I need to match against a string `voltage` as a flag that further processing is required, but this string isn't always in the same field. `zgrep` and `awk` strike me as a reasonable approach for this. If I can accomplish a complex `awk` action with a simple `grep` action, then why not ? — , Jan 17 '21 at 10:16

slitvinov · Accepted Answer · 2014-04-04T17:04:51.107

12

Here is an example. Create the following files and run

awk -f c.awk B.txt A*.txt

c.awk

FNR==NR {
    s[$3]
    next
}

$3 in s {
    print FILENAME, $0
}

A1.txt

1 2 3
1 2 6
1 2 5

A2.txt

1 2 3
1 2 6
1 2 5

B.txt

1 2 3
1 2 5
2 1 8

The output should be:

A1.txt 1 2 3
A1.txt 1 2 5
A2.txt 1 2 3
A2.txt 1 2 5

edited Apr 04 '14 at 17:04

answered Apr 04 '14 at 14:39

slitvinov

5,693
20
31

Tha'ts pretty good. However, what happens if there's a line in `B.txt` that is not in the other files? – David W. Apr 04 '14 at 14:58
I added this line to **B.txt** '2 1 8'. It does not change the output. – slitvinov Apr 04 '14 at 17:08
2

Okay, I see. The FNR is only for the ORIGINAL file. You're only putting stuff in `s` if the line is in `B.txt`. I take it the second half is only executed once you're out of `B.txt`. – David W. Apr 04 '14 at 18:10

Using grep and awk together

2 Answers2

Grep and Awk

Using Only Awk:

Linked