0

I have a situation similar to Bash, grep between two lines with specified string. I have a text file with output in the following format:

HEADER A
lines of output
----------------
HEADER B
lines of output
----------------
...rinse and repeat...

I want to match all the blocks with the same header. grep does not seem sufficient for this task. And I am only vaguely familiar with awk and sed. Just enough to realize they might be the most appropriate tools here. So how do I match a block that is enclosed with matching HEADER and ---------- lines?

My attempt based on the linked question is

awk '/HEADER/{f=1} /-/{f=0;print} f' filename.txt

However, this still matches some of the lines in the blocks with the second header.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
  • Maybe you need to expand your input file a bit and advise how the wrong output you get looks like... – George Vasiliou Jul 16 '17 at 18:15
  • @GeorgeVasiliou Just edited to show more clearly the file format. – Code-Apprentice Jul 16 '17 at 18:15
  • Why not `awk '/HEADER A/{f=1}/-------/{f=0;print}f' file.txt` ...? – George Vasiliou Jul 16 '17 at 18:24
  • @GeorgeVasiliou That is almost a solution. I just realized that my original `awk` command matched unwanted lines with negative numbers. Your proposal also matches **all** lines with dashes, even those which terminate a block with a header other than the one I am trying to match. This is usable, just not 100% ideal. – Code-Apprentice Jul 16 '17 at 19:19
  • 1
    _I want to match all the blocks with the same header_ What does it mean? What is the expected output? – James Brown Jul 16 '17 at 23:19

2 Answers2

1

Adjusting this answer to fit the problem, I got:

sed -n '/HEADER/,/-/p' filename.txt

This is rather brittle (it stops when it finds a hyphen), so something like

sed -n '/HEADER/,/^-+$/p' filename.txt

to check for a full line of hyphens might be preferable. As far as I can tell (not a sed expert), everything between the slashes is just regular regex with the multiline flag m enabled.

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
  • My little bit of research showed that the `'x,y'` syntax matches multiple lines from `x` to `y`, which can be specified with regexes. As far as I can tell, each regex only matches individual lines, though. – Code-Apprentice Jul 16 '17 at 18:29
  • I already encountered the brittleness of your first solution. The blocks with the second header contains negative numbers. – Code-Apprentice Jul 16 '17 at 18:31
1

For a file like this:

$ cat file1
HEADER A
lines of output1.1
----------------
HEADER B
lines of output2.1
----------------
HEADER A
lines of output1.2
----------------
HEADER B
lines of output2.2
----------------
HEADER A
lines of output1.3
----------------
HEADER B
lines of output2.3
----------------

Something like this gives all HEADER A lines :

$ awk '/HEADER A/{f=1} /-------/ && f==1{f=0;print} f' file1
HEADER A
lines of output1.1
----------------
HEADER A
lines of output1.2
----------------
HEADER A
lines of output1.3
----------------

You just need to make one AND condition (&&) at the terminating line

If this is not what you need, i'm afraid you should retype your question to be a bit even more clear.

George Vasiliou
  • 6,130
  • 2
  • 20
  • 27
  • Could you explain, what is the function of the `f` at the end of your command? If `f` is 1, then print the record, is this corect? – FloHe Jul 16 '17 at 20:32
  • 1
    @FloHe in `awk` world, the last `f` is an awk shortcut to say`if f==1 then print` (or in awk code `f==1{print $0}`). More precisely a single f is a condition check that is equivalent to `if f is not zero or f is not unset then print the line`. More over, in awk we can ommit the `{action}` part in the awk syntax which is `condition{action}` . Ommiting the action , the default action is performed = print the line = `{print $0}` – George Vasiliou Jul 16 '17 at 20:41
  • Good answer, thanks – FloHe Jul 16 '17 at 20:44
  • @Code-Apprentice Did this worked as expected? – George Vasiliou Jul 16 '17 at 20:52