-1

I've random data coming in from a source into a file. I have to read thru the file and extract only that portion of data which falls between particular patterns.

Example: Let's suppose the file myfile.out looks like this.

info-data
some more info-data
=================================================================
some-data
some-data
some-data
=================================================================

======================= CONFIG PARMS : ==========================
some-data
some-data
some-data
=================================================================

======================= REQUEST PARAMS : ========================
some-data
some-data
some-data
=================================================================

===================== REQUEST RESULTS ===========================
some-data
=================================================================
some-data
some-data
=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================

some-info-data

I'm looking for the data that matches this particular pattern only

=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================

I did try to look around a bit, like

How to select lines between two marker patterns which may occur multiple times with awk/sed

Bash. How to get multiline text between tags

But the awk, sed solutions given there doesn't seem to work, the commands don't give any errors or outputs.

I tried this

PATTERN1="================================================================="
PATTERN2="==========================F I N I S H============================"
awk -v PAT1="$PATTERN1" -v PAT2="$PATTERN2" 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' myfile.out

and

PATTERN1="================================================================="
PATTERN2="==========================F I N I S H============================"
awk  -v PAT1="$PATTERN1" -v PAT2="$PATTERN2" 'PAT1 {flag=1;next} PAT2 {flag=0} flag { print }' file

Maybe it is due to the pattern? Or I'm doing something wrong.

Script will run on RHEL 6.5.

Marcos
  • 845
  • 3
  • 10
  • 21

4 Answers4

0

Assuming you only need the data and not the pattern, using GNU awk:

awk -v RS='\n={26,}[ A-Z]*={28,}\n' 'RT~/F I N I S H/' file

The record separator RS is set to match lines with a series of = and some optional uppercase characters inbetween.

The only statement is check if the record terminator RT (of the current record) has the FINISH keyword in it. If so, awk will print the whole record consisting of multiple lines.

oliv
  • 12,690
  • 25
  • 45
0

sed can handle this.

Assuming you want to keep the header and footer lines -

$: sed -En '/^=+$/,/^=+F I N I S H=+$/ { /^=+$/ { x; d; }; /^[^=]/ { H; d; }; /^=+F I N I S H=+$/{ H; x; p; q; }; }' infile
=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================

If not, use

sed -En '/^=+$/,/^=+F I N I S H=+$/ { /^=+$/ { s/.*//g; x; d; }; /^[^=]/ { H; d; }; /^=+F I N I S H=+$/{ x; p; q; }; }' infile

Note that if you aren't using GNU sed you'll need to insert newlines instead of all those semicolons.

sed -En '
  /^=+$/,/^=+F I N I S H=+$/ {
    /^=+$/ {
      s/.*//g
      x
      d
    }
    /^[^=]/ {
      H
      d
    }
    /^=+F I N I S H=+$/{
      x
      p
      q
    }
}' infile

Data-I-Need
Data-I-Need
...
...
...
Data-I-Need

Breaking it down -

sed -En '...'

The -En says to use extended pattern matching (the -E, which I really only used for the +'s), and not to output anything unless specifically asked (the -n).

/^=+$/,/^=+F I N I S H=+$/ {...}

says to execute these commands only between lines that are all ='s and lines that are all ='s except for F I N I S H in the middle somewhere. All the stuff between the {}'s will be checked on all lines between those. That does mean from the first =+ line, but that's ok, we handle that inside.

(a) /^=+$/ { x; d; };
(b) /^=+$/ { s/.*//g; x; d; };

(a) says on each of the lines that are all ='s, swap (x) the current line (the "pattern space") with the "hold space", then delete (d) the pattern space. That keeps the current line and deletes whatever you might have accumulated above on false starts. (Remember -n keeps anything from printing till we want it.)

(b) says erase the current line first, THEN swap and delete. It will still add a newline. Did you want that removed?

/^[^=]/ { H; d; };

Both versions use this. On any line that does not start with an =, add it to the hold space (H), then deletes the pattern space (d). The delete always restarts the cycle, reading the next record.

(a) /^=+F I N I S H=+$/{ H; x; p; q; };
(b) /^=+F I N I S H=+$/{ x; p; q; };

On any line with the sentinel F I N I S H string between all ='s, (a) will first append (H) the pattern to the hold space - (b) will not. Both will then swap the pattern and hold spaces (x), print (p) the pattern space (which is now the value accumulated into the hold space), and then delete (d) the pattern space, triggering the next cycle.

At that point, you will be outside the initial toggle, so unless another row of all ='s happens, you'll skip all the remaining lines. If one does it will again begin to accumulate records, but will not print them unless it hits another F I N I S H record.

}' infile

This just closes the script and passes in whatever filename you were using. Note that is is not an in-place edit...

Hope that helps.

Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
0

Although there is already a sed solution there, I like sed for its simplicity:

sed -n '/^==*\r*$/,/^==*F I N I S H/{H;/^==*[^F=]/h;${g;p}}' file

In this sed command we made a range for our commands to be run against. This range starts with a line which begins, contains only and ends to = and then finishes on a line that starts with = and heads to F I N I S H. Now our commands:

H appends each line immediately to hold space. Then /^==*[^F=]/h executes on other sections' header or footer that it replaces hold space with current pattern space.

And at the last line we replaces current pattern space with what is in hold space and then print it using ${g;p}. The whole thing outputs this:

=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================
revo
  • 47,783
  • 14
  • 74
  • 117
  • I do not need the header and footer lines. – Marcos Nov 09 '18 at 16:56
  • Then piple results to `sed '1d;$d;'` – revo Nov 09 '18 at 18:08
  • for some reason your solution didn't work out, I'll test further to understand why. – Marcos Nov 14 '18 at 11:49
  • Please elaborate. *it didn't work* does not describe what the problem is, whether or not you get any result and if yes what is that. However I support @potong's answer. That has a nice idea behind. – revo Nov 14 '18 at 11:53
0

This might work for you (GNU sed):

sed -r '/^=+$/h;//!H;/^=+F I N I S H=+$/!d;x;s/^[^\n]*\n|\n[^\n]*$//g' file

Store a line containing only ='s in the hold space (replacing anything that was there before). Append all other lines to hold space. If the current line is not a line containing ='s followed by F I N I S H followed by ='s, delete it. Otherwise, swap to the hold space, remove the first and last lines and print the remainder.

potong
  • 55,640
  • 6
  • 51
  • 83