1

I have the following data:

Example line 0</span>
<tag>Example line 1</tag>
<span>Example line 1.5</span>
--Hello Example line 1.7
<tag>
Example line 2
</tag>
--Hello Example line 2.7
<span>Example line 4</span>

Using this command awk -v RS='</tag>' 'RT {gsub(/.*?<tag>|\n/, ""); print "<tag>" $0 RT}' I get:

<tag>Example line 1</tag>
<tag>Example line 2</tag>

However, I want the output to be:

<tag>Example line 1</tag>
--Hello Example line 1.7
<tag>Example line 2</tag>
--Hello Example line 2.7

Question:

I would just like to know how to add the "or" option to also match any line that begins with --Hello. What would be the proper way to implement in my code?

Other options:

Or, another option would be to use grep -o '<tag.*tag>\|^--.*' but I would need to also find a way to match newlines (as asked here: Match Anything In Between Strings For Linux Grep Command).

Any help is highly appreciated.

Community
  • 1
  • 1
DomainsFeatured
  • 1,426
  • 1
  • 21
  • 39
  • Hi @EdMorton, noted. I would be open to other solutions as well :-) – DomainsFeatured Oct 14 '16 at 21:56
  • OK I posted a solution. btw whatever you think `.*?` means, you are wrong as it's simply ERE regexp nonsense. If you tell us what you thought it'd do we can tell you how to do whatever it is in awk. – Ed Morton Oct 14 '16 at 22:07

2 Answers2

2

You can modify your earlier awk command to this:

awk -v RS='</tag>' '/\n--Hello /{print gensub(/.*\n(--Hello [^\n]*).*/, "\\1", "1")}
       RT{gsub(/.*<tag>|\n/, ""); print "<tag>" $0 RT}' file

<tag>Example line 1</tag>
--Hello Example line 1.7
<tag>Example line 2</tag>
--Hello Example line 2.7
anubhava
  • 761,203
  • 64
  • 569
  • 643
0
$ cat tst.awk
BEGIN { RS="--Hello[^\\n]+|<\\/tag>" }
RT { print (RT~/^--/ ? "" : gensub(/.*(<tag>)/,"\\1",1)) RT }

$ awk -f tst.awk file
<tag>Example line 1</tag>
--Hello Example line 1.7
<tag>
Example line 2
</tag>
--Hello Example line 2.7

The above uses GNU awk for multi-char RS, RT, and gensub().

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Of course. Any program of any length in any language can be written on 1 line. – Ed Morton Oct 15 '16 at 00:29
  • Hi Ed, that's cool. Good to know. For those of us who are just learning on the job and working without proper IT backgrounds or programming degrees, these answers and insights help fill our gaps in knowledge. Thank you for contributing and helping out. For anyone like myself, your skills are highly appreciated :-) – DomainsFeatured Oct 15 '16 at 02:22