0

i've a log file like this

some text line
other text line
<a>
  <b>1</b>
  <c>2</c>
</a>
another text line
<a>
  <b>1</b>
  <c>2</c>
</a>
yet another text line

I need to get only ther first occurrence of the XML "a":

<a>
  <b>1</b>
  <c>2</c>
</a>

I know

awk '/<a>/,/<\/a>/' file.log

will find all occurrences, how can I get just the first? (adding |head -n1 obvously doesn't work because it will capture only first line, and I can't know for sure how long "a" is because the awk expression must be generic because I've different log files with different "a" contents)

Diego Shevek
  • 486
  • 3
  • 15
  • This is a good start, https://stackoverflow.com/questions/38972736/how-to-print-lines-between-two-patterns-inclusive-or-exclusive-in-sed-awk-or just exit after the first match. – James Brown Oct 03 '19 at 15:18
  • Post valid XML/HTML and use xmlstarlet. – Cyrus Oct 03 '19 at 16:15
  • 1
    You should have given the 2 `..` blocks different content so when we're testing a potential solution we can tell if it's the first or 2nd block being output. – Ed Morton Oct 03 '19 at 17:56

3 Answers3

0

This awk:

awk '
match($0,/<a>/) {
    $0=substr($0,RSTART)
    flag=1
}
match($0,/<\/a/) {
    $0=substr($0,1,RSTART+RLENGTH)
    print
    exit
}
flag' file

can handle these forms:

The above awk handles this:
<a><b>1</b><c>2</c></a>
and this:
<a>
  <b>1</b>
  <c>2</c>
</a>
and also <a>
  <b>1</b>
  <c>2</c>
</a> this
the end

Another for GNU awk:

$ gawk -v RS="</?a>" '
NR==1 { printf RT }
NR==2 { print $0 RT }
' file
James Brown
  • 36,089
  • 7
  • 43
  • 59
0

Another slight variation is to simply use a simple counter variable to indicate when you are in the first <a>...</a> block, outputting that block and then exiting afterwards. In your case using n as the variable to indicate in the first block, e.g.

awk -v n=0 '$1=="</a>" {print $1; exit} $1=="<a>" {n=1}; n==1' f.xml

Example Use/Output

With your input file as f.xml you would get:

$ awk -v n=0 '$1=="</a>" {print $1; exit} $1=="<a>" {n=1}; n==1' f.xml
<a>
  <b>1</b>
  <c>2</c>
</a>

(note: the {n=1} and n==1 rules rely on the default operation (print) to output the record)

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
0

First:

$ awk '/<a>/{f=1} f; /<\/a>/{exit}' file
<a>
  <b>1</b>
  <c>2</c>
</a>

Last:

$ tac file | awk '/<\/a>/{f=1} f; /<a>/{exit}' | tac
<a>
  <b>1</b>
  <c>2</c>
</a>

Nth:

$ awk -v n=2 '/<a>/{c++} c==n{print; if (/<\/a>/) exit}' file
<a>
  <b>1</b>
  <c>2</c>
</a>
Ed Morton
  • 188,023
  • 17
  • 78
  • 185