sed: grab first instance of html tag

Question

I have an html file that uses the tag <table> multiple times throughout the script. I want to use sed to grab and print to console just the first instance that the <table> tag is used.

This is a snippet of the html that I am trying to parse. There are over 10 instances of the <table> tag.

My HTML:

<table border="0" class="first">
  <tr class="a">
     <th>Tests</th>
     <th>Errors </th>
  </tr>
  <tr class="b">
     <td>32</td>
     <td>0</td>
  </tr>
</table>
<table border="0" class="second">
  <tr class="c">
     <th>Tests</th>
     <th>Errors </th>
  </tr>
  <tr class="d">
     <td>32</td>
     <td>0</td>
  </tr>
</table>

Here is the code I'm running

sed -n 's:.*<table\(.*\)</table>.*:\1:p' surefire-report.html

I want to be able to grab everything within the first <table> div. So output should be just this:

<table border="0" class="first">
  <tr class="a">
     <th>Tests</th>
     <th>Errors </th>
  </tr>
  <tr class="b">
     <td>32</td>
     <td>0</td>
  </tr>
</table>

While regex is possible to use in this case, [it's still not a good idea in general for HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) — jeremysprofile, Nov 30 '18 at 18:19
Please [edit] the question and update with the expected output instead of adding it in comments. The example input is not ideal because there is only a single complete `
` pair in there anyway. — Benjamin W., Nov 30 '18 at 19:46

score 0 · Answer 1 · answered Nov 30 '18 at 18:44

0

If I understand you correctly, it should work...

FILE=surefire-report.html

START=$(grep -n -m1  "<table" $FILE | cut -d ':' -f1)
END=$(grep -n -m1 "</table" $FILE | cut -d ':' -f1)

sed -n -e "$START,$END p" $FILE

answered Nov 30 '18 at 18:44

Gustavo Ferreira

41
2

Is this something that I enter in bash/terminal? – dooge Nov 30 '18 at 20:48
Yes. You can create a bash script with this or copy and past in bash terminal – Gustavo Ferreira Nov 30 '18 at 21:09

sed: grab first instance of html tag

1 Answers1