0

I have a file which has got content like below. START and STOP stand for a block.

START
X | 123
Y | abc
Z | +=-
STOP
START
X | 456
Z | +%$
STOP
START
X | 789
Y | ghi
Z | !@#
STOP

I would like to get the values of X and Y printed in the format below for each block:

123 ~~ abc
456 ~~ 
789 ~~ ghi

If it is single occurrence of START/STOP, sed -n '/START/,/STOP/p' would have helped. Since this is repetitive, I need your help.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
Sathy
  • 303
  • 2
  • 8
  • 18

3 Answers3

2

Sed is always the wrong choice for any problem that involves processing multiple lines. All of sed's arcane constructs for doing so became obsolete in the mid-1970s when awk was invented.

Whenever you have name-value pairs in your input I find it useful to create an array that maps each name to it's value and then access the array by the names. In this case, using GNU awk for multi-char RS and delete array:

$ cat tst.awk
BEGIN {
    RS = "\nSTOP\n"
    OFS=" ~~ "
}
{
    delete n2v
    for (i=2;i<=NF;i+=3) {
        n2v[$i] = $(i+2)
    }
    print n2v["X"], n2v["Y"]
}

$ gawk -f tst.awk file
123 ~~ abc
456 ~~ 
789 ~~ ghi
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    I like the idea of storing the values in an array, +1 and a moral +1 for also adding an explanation :) ! – fedorqui Feb 10 '15 at 17:02
  • 1
    Haha it is funny that you kind of end up apologising for explaining your answer ;) And yes, it is indeed quite useful to read – fedorqui Feb 10 '15 at 17:04
2

Based on my own solution to How to select lines between two marker patterns which may occur multiple times with awk/sed:

awk -v OFS=" ~~ " '
       /START/{flag=1;next}
       /STOP/{flag=0; print first, second; first=second=""}
       flag && $1=="X" {first=$3}
       flag && $1=="Y" {second=$3}' file

Test

$ awk -v OFS=" ~~ " '/START/{flag=1;next}/STOP/{flag=0; print first, second; first=second=""} flag && $1=="X" {first=$3} flag && $1=="Y" {second=$3}' a
123 ~~ abc
456 ~~ 
789 ~~ ghi
Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
1

Because I like brain teasers (not because this sort of thing is practical to do in sed), a possible sed solution is

sed -n '/START/,/STOP/ { //!H; // { g; /^$/! { s/.*\nX | \([^\n]*\).*/\1 ~~/; ta; s/.*/~~/; :a G; s/\n.*Y | \([^\n]*\).*/ \1/; s/\n.*//; p; s/.*//; h } } }'

This works as follows:

/START/,/STOP/ {                        # between two start and stop lines
  //! H                                 # assemble the lines in the hold buffer
                                        # note that // repeats the previously
                                        # matched pattern, so // matches the
                                        # start and end lines, //! all others.

  // {                                  # At the end
    g                                   # That is: When it is one of the
    /^$/! {                             # boundary lines and the hold buffer
                                        # is not empty

      s/.*\nX | \([^\n]*\).*/\1 ~~/     # isolate the X value, append ~~

      ta                                # if there is no X value, just use ~~
      s/.*/~~/
      :a 

      G                                 # append the hold buffer to that
      s/\n.*Y | \([^\n]*\).*/ \1/       # and isolate the Y value so that
                                        # the pattern space contains X ~~ Y

      s/\n.*//                          # Cutting off everything after a newline
                                        # is important if there is no Y value
                                        # and the previous substitution did
                                        # nothing

      p                                 # print the result

      s/.*//                            # and make sure the hold buffer is
      h                                 # empty for the next block.
    }
  }
}
Wintermute
  • 42,983
  • 5
  • 77
  • 80
  • What can I say. I'm given with handful of answers. Thanks all. With a sample piece of data, Wintermute solution takes 0m0.151s, Ed Morton takes 0m0.160s and fedorqui takes 0m0.163s.. thanks all again – Sathy Feb 10 '15 at 17:34
  • 1
    IMHO the speed of execution between a sed and an awk solution will never be an issue. Just try modifying one of them to, say, print a debugging statement for every line read or print a count at the end of how many times you found a "Y" or .... – Ed Morton Feb 10 '15 at 17:46
  • 3
    There's the `l` command for that. :P Seriously though, you'll want to use one of the awk solutions. I don't agree that awk is always better (mostly because it doesn't have backrefs), but here it's no contest. I mean, look at this, and look at @fedorqui's solution. One of them is human-readable, the other is mine. You don't want to pull in unmaintainable code for 7% runtime. I wrote this for fun. – Wintermute Feb 10 '15 at 17:54