0

I'm new to both awk and perl, so please bear with me. I have the following awk script:

awk '/regex1/{p = 0;} /regex2/{p = 1;} p'

What this basically does is print all lines staring from line matching with regex2 until a line matching with regex1 is found.

Example:

 regex1
 regex2
 line 1
 line 2
 regex1
 regex2
 regex1

Output:

 regex2
 line 1
 line 2
 regex2

Is it possible to simulate this using a perl one-liner? I know I can do it with a script saved in a file.

Edit:

A practical example:

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,828 [INFO] 567890 (Blah : Blah1) Service-name:: Content( May span multiple lines)

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,831 [INFO] 567890 (Blah : Blah2) Service-name:: Content( May span multiple lines)

Given the search key 123456 I want to extract the following:

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

The following awk script does the job:
awk '/[0-9]{2}\s\w+\s[0-9]{4}/{n = 0} /123456/ {n =1}n' file

gitmorty
  • 263
  • 1
  • 2
  • 8
  • U know there is a progamm awk2perl which you could try? – JFS31 Jun 14 '17 at 12:21
  • for awk, see https://stackoverflow.com/a/38972737/4082052 for better ways... if you know how to write perl script, see https://stackoverflow.com/documentation/perl/3696/perl-one-liners#t=201706141257567028325 and http://perldoc.perl.org/perlrun.html#Command-Switches .. you'll want to use http://perldoc.perl.org/perlop.html#Range-Operators – Sundeep Jun 14 '17 at 13:00

2 Answers2

3
perl -ne 'print if (/regex2/ .. /regex1/) =~ /^\d+$/'

This is slightly crazy, but here's how it works:

  • -n adds an implicit loop over the input lines
  • the current line is in $_
  • the two bare regex matches (/regex2/, /regex1/) implicitly test against $_
  • we use .. in scalar context, which turns it into a stateful flip-flop operator

    By that I mean: X .. Y starts out in the "false" state. In the "false" state it only evaluates X. If X returns a false value, it remains in the "false" state (and returns false itself). Once X returns a true value, it moves into the "true" state and returns true.

    In the "true" state it only evaluates Y. If Y returns false, it remains in the "true" state (and returns true itself). Once Y returns a true value, it moves into the "false" state but it still returns true.

  • had we just used print if /regex2/ .. /regex1/, it would have printed all the terminating regex1 lines, too

  • a close reading of Range Operators in perldoc perlop reveals that you can distinguish the end points of the range
  • the "true" value returned by .. is actually a sequence number starting from 1, so the start of a range can be identified by checking for 1
  • when the end of the range is reached (i.e. we're about to move from the "true" state to the "false" state again), the return value gets a "E0" tacked on to the end

    Adding "E0" to an integer doesn't affect its numeric value. Perl implicitly converts strings to numbers when needed, and something like "5E0" is just scientific notation (meaning 5 * 10**0, which is 5 * 1, which is 5).

  • the "false" value returned by .. is the empty string, ""

We check that the result of .. matches the regex /^\d+$/, i.e. is all digits. This excludes the empty string (because we require at least one digit to match), so we don't print lines outside of the range. It also excludes the last line in our range, because E is not a digit.

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • Thanks for the explanation. That is crazy indeed. I actually gave a very general example in the question, for which your code works. I also need to print the cases where regex1 and regex2 are on the same line(giving priority to regex2). But I believe I can do that on my own, thanks to your explanation. – gitmorty Jun 15 '17 at 07:08
  • @AkhilAvinash That sounds like it can be done with something like `my $p = /regex2/ .. /regex1/; print if $p && ($p == 1 || $p !~ /E/);` – melpomene Jun 15 '17 at 07:22
  • No, that didn't do the job. When we have something line `regex1 regex2` It only prints that line but not the lines after, as the value of $p is somehow set to 1E0 inside the line itself. I believe that each $_ is matched with both /regex2/ and /regex1/, and hence the range ends inside the line itself. Let me know if there's a way around this. – gitmorty Jun 15 '17 at 09:24
  • @AkhilAvinash Oh! If I understand you correctly, that's just `...` instead of `..`. – melpomene Jun 15 '17 at 10:54
  • I'm sorry this is becoming such a mess. But there's another problem. When we have : `regex1 regex2`\n `multiple lines` \n `regex1 regex2` The script doesn't print the last line. The above given practical example is what I'm testing on. My script is `perl -ne 'my $p = (/123456/.../[0-9]+ \w+ [0-9]{4}/); print if $p && ($p == 1 || $p !~ /E/);' file` – gitmorty Jun 15 '17 at 11:12
0

Not sure if awk prints both the start and end of the range, but Perl does:

perl -ne 'if(/regex2/ ... /regex1/){print}' file

Edit: Awk (at least Gnu awk) also has a range operator, so this could have been done more simply as:

awk '/regex2/,/regex1/' file
stark
  • 12,615
  • 3
  • 33
  • 50
  • I actually need the script to print only the start and the lines in between, excluding the end of the range. The awk script does exactly that. Is there a way to modify your perl one-liner to do the same? – gitmorty Jun 15 '17 at 06:10