1

I am parsing a log file, loaded with many types of XMLs. I am using awk to extract a specific part of a XML. I am using the following:

        awk '/<TAG>/,/<\/TAG>/' ${LOG} > OUTPUT.txt;

However, because these are inner tags and not the beginning or end of the XML as a whole (because there are multiple types of XMLs), I need to add the initial tag as well as closing tag at the bottom (to complete the log since the top and bottom tags are generic for all kinds of XMLs and I want a specific kind).

The question is:

Is there a way I can add text before and after each awk iteration?

example:

Input:

<TAG>
    <InnerTAG>
    </InnerTAG>
</TAG>
<TAGTWO>
    <InnerTAG>
    </InnerTAG>
</TAGTWO>
<TAG>
    <InnerTAG>
    </InnerTAG>
</TAG>

Output:

TOP
<TAG>
    <InnerTAG>
    </InnerTAG>
</TAG>
BOTTOM
TOP
<TAG>
    <InnerTAG>
    </InnerTAG>
</TAG>
BOTTOM

Where TOP & BOTTOM are two different prints or something that I added?

PS: I have no way to know how many iterations there are in advance... this is also dynamic per file.

Thanks,

Gal Appelbaum
  • 1,905
  • 6
  • 21
  • 33
  • 1
    See http://stackoverflow.com/q/23934486/258523 for discussion on why the range syntax isn't all that useful and to see an equivalent script which is more useful and should let you do exactly what you want here with little effort. – Etan Reisner Aug 31 '15 at 16:08
  • I'm confused - are you trying to add text around `...` as stated in your text or `...` as shown in your example or is it something to do with `...`? Please edit your question to be precise and to show both testable sample input AND expected output. – Ed Morton Aug 31 '15 at 17:29
  • your close TAGs are incorrect. – karakfa Aug 31 '15 at 19:51
  • @EtanReisner I'll take a look at it as soon as I can. lets see if it helps :) – Gal Appelbaum Sep 01 '15 at 00:57

1 Answers1

0

Try something like

awk '/<TAG>/ {print "Before"} 
     /<TAG>/,/<\/TAG>/ {print} 
     /<\/TAG>/ {print "After"}' ${LOG} > OUTPUT.txt;

Where you replace "Before" and "After" with whatever you want.

toth
  • 2,519
  • 1
  • 15
  • 23
  • Never use range expressions as they require you to duplicate conditions when the problem becomes even marginally interesting, as shown in this answer. Just use a flag. – Ed Morton Aug 31 '15 at 17:31
  • @EdMorton can you elaborate a bit? – Gal Appelbaum Sep 02 '15 at 09:30
  • @toth the answer you provided half works... it does print above and below, but prints blank lines... – Gal Appelbaum Sep 02 '15 at 09:33
  • @GalAppelbaum, if I give it the input you specified I get exactly your specified output (if change "Before" to "TOP" and "After" to "BOTTOM"), don't understand the issue. – toth Sep 02 '15 at 12:32
  • 1
    @GalAppelbaum notice that in this solution the conditions `//` and `/<\/TAG>/` occur in 2 places. Any time you have to duplicate hard-coded values in software you are writing bad software since if you needed to change the values later you'd need to do it multiple times and so its time consuming and prone to breakage if you miss one. See also http://stackoverflow.com/q/23934486/1745001. – Ed Morton Sep 02 '15 at 14:40
  • @EdMorton I don't completely understand the post... I am going to take some time to figure this out then optimize the code... for now this will have to do... Thanks for the insight though! – Gal Appelbaum Sep 03 '15 at 14:08