127

I am learning file comparison using awk.

I found syntax like below,

awk 'NR==FNR{a[$1];next}$1 in a{print $1}' file1 file2

I couldn't understand what is the significance of NR==FNR in this? If I try with FNR==NR then also I get the same output?

What exactly does it do?

kvantour
  • 25,269
  • 4
  • 47
  • 72
Amit
  • 1,371
  • 2
  • 9
  • 3

6 Answers6

141

In Awk:

  • FNR refers to the record number (typically the line number) in the current file.
  • NR refers to the total record number.
  • The operator == is a comparison operator, which returns true when the two surrounding operands are equal.

This means that the condition NR==FNR is normally only true for the first file, as FNR resets back to 1 for the first line of each file but NR keeps on increasing.
This pattern is typically used to perform actions on only the first file. It works assuming that the first file is not empty, otherwise the two variables would continue to be equal while Awk was processing the second file.

The next inside the block means any further commands are skipped, so they are only run on files other than the first.

The condition FNR==NR compares the same two operands as NR==FNR, so it behaves in the same way.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
  • 3
    "=" is sometimes used to test equality, and sometimes to make an assignment. FNR==NR would be different than NR==FNR if the double equals sign was being used for assignment. So for someone unfamiliar with awk, such as this asker, it seems reasonable to ask if they're the same. – Todd Walton Dec 19 '18 at 18:28
  • @ToddWalton Good point! Another example: `a='3x'; if [[ $a == 3* ]]; then echo yes; fi` and you can not switch both sides of `==`. – Walter A Dec 19 '18 at 22:46
  • @WalterA yes that's true (in Bash, at least). Are you suggesting any improvement to my answer? – Tom Fenech Dec 20 '18 at 00:36
  • 1
    No, your answer is fine. I really like to see that the community likes our answers just as much. We use different styles and both are regarded very helpful. I just gave you an upvote, so for this moment we have the same number of upvotes. – Walter A Dec 20 '18 at 08:03
  • Terrific explanation @Tom Fenech Thank you! – Roger Costello Aug 22 '22 at 22:26
  • Just a heads up that `NR==FNR` doesn't work as expected if your first input file is empty. Having no lines means that NR is still zero going into the second file. – Mr. Llama Jan 15 '23 at 03:42
  • @Mr.Llama true, I updated my answer to mention that case, thanks. – Tom Fenech Jan 16 '23 at 09:03
93

Look for keys (first word of line) in file2 that are also in file1.
Step 1: fill array a with the first words of file 1:

awk '{a[$1];}' file1

Step 2: Fill array a and ignore file 2 in the same command. For this check the total number of records until now with the number of the current input file.

awk 'NR==FNR{a[$1]}' file1 file2

Step 3: Ignore actions that might come after } when parsing file 1

awk 'NR==FNR{a[$1];next}' file1 file2 

Step 4: print key of file2 when found in the array a

awk 'NR==FNR{a[$1];next} $1 in a{print $1}' file1 file2
Walter A
  • 19,067
  • 2
  • 23
  • 43
  • 4
    Brilliant takedown of this one-liner. Is the semicolon in Step 1 necessary? – Tomasz Gandor Aug 08 '17 at 05:53
  • 2
    @TomaszGandor The semicolon is not needed in step 1. I could have added it in step 3, but `;next` is a weird addition (like to add `next` and need the semicolon in step 3). You can test step 1 with `awk '{a[$1]} END { for (k in a) { print "a[k]=" k } }' file1`. – Walter A Aug 08 '17 at 10:30
68

Look up NR and FNR in the awk manual and then ask yourself what is the condition under which NR==FNR in the following example:

$ cat file1
a
b
c

$ cat file2
d
e

$ awk '{print FILENAME, NR, FNR, $0}' file1 file2
file1 1 1 a
file1 2 2 b
file1 3 3 c
file2 4 1 d
file2 5 2 e
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • is it possible also to print the number of the file being processed? is there a built-in variable for that? (I know we could create a variable for that and increment it every-time NR is one) – LEo Sep 19 '19 at 16:33
  • 1
    In GNU awk that variable is `ARGIND`, otherwise you can do `FNR==1{ print ++file_nr }`. – Ed Morton Sep 19 '19 at 19:14
23

There are awk built-in variables.

NR - It gives the total number of records processed.

FNR - It gives the total number of records for each input file.

Dhruvenkumar Shah
  • 520
  • 2
  • 10
  • 26
sat
  • 14,589
  • 7
  • 46
  • 65
23

Assuming you have Files a.txt and b.txt with

cat a.txt
a
b
c
d
1
3
5
cat b.txt
a
1
2
6
7

Keep in mind NR and FNR are awk built-in variables. NR - Gives the total number of records processed. (in this case both in a.txt and b.txt) FNR - Gives the total number of records for each input file (records in either a.txt or b.txt)

awk 'NR==FNR{a[$0];}{if($0 in a)print FILENAME " " NR " " FNR " " $0}' a.txt b.txt
a.txt 1 1 a
a.txt 2 2 b
a.txt 3 3 c
a.txt 4 4 d
a.txt 5 5 1
a.txt 6 6 3
a.txt 7 7 5
b.txt 8 1 a
b.txt 9 2 1

lets Add "next" to skip the first matched with NR==FNR

in b.txt and in a.txt

awk 'NR==FNR{a[$0];next}{if($0 in a)print FILENAME " " NR " " FNR " " $0}' a.txt b.txt
b.txt 8 1 a
b.txt 9 2 1

in b.txt but not in a.txt

 awk 'NR==FNR{a[$0];next}{if(!($0 in a))print FILENAME " " NR " " FNR " " $0}' a.txt b.txt
b.txt 10 3 2
b.txt 11 4 6
b.txt 12 5 7

awk 'NR==FNR{a[$0];next}!($0 in a)' a.txt b.txt
2
6
7
0

Here is the pseudo code for your interest.

NR = 1
for (i=1; i<=files.length; ++i) {
    line = read line from files[i]
    FNR = 1
    while (not EOF) {
        columns = getColumns(line)

        if (NR is equals to FNR) { // processing first file
            add columns[1] to a
        } else { // processing remaining files
            if (columns[1] exists in a) {
                print columns[1]
            }
        }
        NR = NR + 1
        FNR = FNR + 1
        line = read line from files[i]
    }
}
Franz Wong
  • 1,024
  • 1
  • 10
  • 32