0

This is my sample input file:

xxxxx,12345,yy,ABN,ABE,47,20171018130030,122021010147421,2,IN,3,13,9741588177,32
xxxxxx,9741588177,yy,ABN,ABE,54,20171018130030,122025010227014,2,IN,3,15,12345,32

I want to compare 2 consecutive lines in this file with this condition:

  1. The 12th field of the 1st line and 12th field of the 2nd line must be 13 and 15, respectively.
  2. If the conditions in point 1 are met, then the 2nd field of line 1 (which has the 12th field value as 13) must match the 13th field of line 2 (which has the 12th field as 15).

The file contains many such lines where the above condition is not met, I would like to print only those lines which meet conditions 1 and 2.

Any help in this regard is greatly appreciated!

amseager
  • 5,795
  • 4
  • 24
  • 47

3 Answers3

0

It's not clear if you want to compare the lines in groups of 2 (ie, compare lines 1 and 2, and then lines 3 and 4) or serially (ie, compare lines 1 and 2, and then 2 and 3). For the latter:

awk 'NR > 1 && prev_12 == 13 && $12 == 15 && 
    prev_2 == $13 {print prev; print $0} 
    {prev=$0; prev_12=$12; prev_2=$2}' FS=, input-file

For the former, add the condition NR % 2 == 0 . (I'm assuming you intended to mention that fields are comma separated, which appears to be the case judging by the input.)

William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • Thanks a ton! This works. When you use NR does it automatically take 2 consecutive lines? For example, the code "prev_12 == 13 && $12 == 15" is comparing field 12 of 2 consecutive lines. This was my problem area. – vijey kumar Feb 25 '19 at 21:07
  • The command is run once on each line. We do nothing on line 1 (`NR>1`) except save the line and some fields in "prev", "prev_12", and "prev_2". When we are reading a line, "prev", "prev_2", and "prev_12" contain the needed data from the previous line. – William Pursell Feb 25 '19 at 22:05
  • Sorry for not giving a clear picture of the requirement. I have a xml file which has 12 comma separated fields. The fields of interest to me are fields 2,12 and 13, because I would want to know whether these fields are swapped in the 12th field (for values 13 and 15 respectively). What I have done as a first step is , I have searched for values 13 and 15 in the 12th field, sorted them by time and then run the command that you suggested on top of it. It has served the purpose so far. – vijey kumar Mar 04 '19 at 11:41
0

another awk

$ awk -F, '$12==13 {p0=$0; p2=$2; c=1; next} 
           c&&c-- && $12==15 && p2==$13 {print p0; print}' file

start capturing only when the initial match on $12 of the first line.

c&&c-- is a smart counter (count-down here), which will stop at 0 (due to first c before the ampersand). Ed Morton has a post with a lot more examples of the smart counters

karakfa
  • 66,216
  • 7
  • 41
  • 56
  • Thanks for the command. This works just fine but I am not familiar with some of the syntax. What specifically does this part of the command play? "c&&c-- &&" – vijey kumar Mar 04 '19 at 11:44
0

Wish you'd used a few more lines of sample input and provided expected output so we're not all just guessing but MAYBE this is what you want to do:

$ cat tst.awk
BEGIN { FS="," }
(p[12] == 13) && ($12 == 15) && (p[2] == $13) { print p[0] ORS $0 }
{ split($0,p); p[0]=$0 }

$ awk -f tst.awk file
xxxxx,12345,yy,ABN,ABE,47,20171018130030,122021010147421,2,IN,3,13,9741588177,32
xxxxxx,9741588177,yy,ABN,ABE,54,20171018130030,122025010227014,2,IN,3,15,12345,32
Ed Morton
  • 188,023
  • 17
  • 78
  • 185