5

I need to process two file contents. I was wondering if we can pull it off using a single nawk statement.

File A contents:

AAAAAAAAAAAA  1
BBBBBBBBBBBB  2
CCCCCCCCCCCC  3

File B contents:

XXXXXXXXXXX  3
YYYYYYYYYYY  2
ZZZZZZZZZZZ  1

I would like to compare if $2 (2nd field ) in file A is the reverse of $2 in file B. I was wondering how to write rules in nawk for multi-file processing ? How would we distinguish A's $2 from B's $2

EDIT: I need to compare $2 of A's first line (which is 1) with the $2 of B's last line (which is 1 again) .Then compare $2 of line 2 in A with $2 in NR-1 th line of B. And so on.....

tshepang
  • 12,111
  • 21
  • 91
  • 136
tomkaith13
  • 1,717
  • 4
  • 27
  • 39
  • please edit your data so that at least one line has $2 being the reverse of each other. Also edit to show what output you expect, given the input. Good luck. – shellter Dec 14 '11 at 19:02
  • @shelter: hi ... I need to compare if $2 in the first line in A with the $2 in last line with B.. There is no expected output. The question is about how to perform that comparision (if its even possible) – tomkaith13 Dec 14 '11 at 23:16
  • @shelter: I have added an edit to help you understand the problem better... hope this helps... thanks – tomkaith13 Dec 14 '11 at 23:25

3 Answers3

6

You can do something like this -

[jaypal:~/Temp] cat f1
AAAAAAAAAAAA  1
BBBBBBBBBBBB  2
CCCCCCCCCCCC  3
DDDDDDDDDDDD  4

[jaypal:~/Temp] cat f2
AAAAAAAAAAA  5
XXXXXXXXXXX  3
YYYYYYYYYYY  2
ZZZZZZZZZZZ  1

Solution:

awk '
NR==FNR {a[i++]=$2; next}
{print (a[--i] == $2 ? "Match " $2 FS a[i] : "Do not match " $2 FS a[i])}' FileB FileA
Match 1 1
Match 2 2
Match 3 3
Do not match 4 5
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • Ahhh FNR!!... thats the trick ... Dint know what exactly its used for... thanks for this – tomkaith13 Dec 15 '11 at 18:16
  • You're welcome. :) Yea, when you have to work with more than 1 file then `FNR` comes handy. `FNR` is similar to `NR` in respect that it stores the record numbers but unlike `NR` it gets reset to 0 once the file is completely read. So `NR===FNR` restricts particular action on one file so that we can add more actions only for the second file to work on. – jaypal singh Dec 15 '11 at 18:22
  • Gotcha...and with 'next' you kept ensuring that 'NR==FNR' condition is always hit as long as we are processing file A. This is a well thought solution . Kudos – tomkaith13 Dec 15 '11 at 19:21
  • You could shorten that code to `awk 'NR==NFR{a[i++]=$2;next}a[--i]==$2{print "Match",a[i],a[i];next}{print "Do Not Match",a[i],$2}' f1 f2`. – mschilli Sep 02 '13 at 09:04
  • 1
    Does this read the file contents into memory? If so, is there a way to do it without reading the file contents into memory? – tommy.carstensen Oct 17 '13 at 17:20
6

You can make awk process files serially, but you can't easily make it process two files in parallel. You probably can achieve the effect with careful use of getline but 'careful' is the operative term.

I think in this case, with simple two-column files, I'd be inclined to use:

paste "File A" "File B" |
awk '{ process fields $1, $2 from File A and fields $3, $4 from file B }'

You would need to make sure the two files are in the appropriate order, etc.

If your input is more complex, then this may not work so well, though you can choose the character that separates the data from the two files with paste -d'|' ... to use a pipe to separate the two records, and awk -F'|' '{ ... }' to read $1 as the info from File A and $2 as the info from File B.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • working with interpreters and serial programming is kinda hard to grasp since I am used to dealing with C. All we need is two pointers and we keep comparing away at the rite offset... Thanks for the alternative :) – tomkaith13 Dec 15 '11 at 19:26
0

Have you thought about doing something like the following?

diff --brief <(awk '{print $2}' A) <(tac B | awk '{print $2}')

tac reverses the lines of file B and then you can compare the two columns.

cmbuckley
  • 40,217
  • 9
  • 77
  • 91