How to grep multiple lines from a HTML page

Question

I have a HTML page with lot of tags like this

<tr>
 <td> a </td>
</tr>

<tr>
 <td> a </td>
</tr>

<tr>
 <td> a </td>
</tr>

I need to grep all these blocks alone leaving all other content (like general text).

I saw some other post on pcregrep, but it was not clear to me.

Can somebody help me with grep (or any other solution) ?

score 1 · Answer 1 · answered Mar 05 '12 at 13:29

1

You can use sed to get all these blocks:

$ sed -n '/<tr>/,/<\/tr>/p' input.html
<tr>
 <td> a </td>
</tr>
<tr>
 <td> a </td>
</tr>
<tr>
 <td> a </td>
</tr>

answered Mar 05 '12 at 13:29

kev

this extracts all the things between the first and the last . if OP wants only blocks, this solution my have problem. e.g. ...... – Kent Mar 05 '12 at 14:02
If `` and `` are always on separated lines, this command works. – kev Mar 05 '12 at 14:18
forget it.. it is really hard to bring code format in comment. tired to make the comment look better. solve this could be more difficult than the original question.. I give up. – Kent Mar 05 '12 at 14:44

score 0 · Answer 2 · edited May 23 '17 at 12:12

0

See my answer to this previous question. Basically you use greps -z option plus a very specific regex.

edited May 23 '17 at 12:12

Community

answered Mar 05 '12 at 13:29

beerbajay

2 Answers2