0

I have a HTML page with lot of tags like this

<tr>
 <td> a </td>
</tr>

<tr>
 <td> a </td>
</tr>

<tr>
 <td> a </td>
</tr>

I need to grep all these blocks alone leaving all other content (like general text).

I saw some other post on pcregrep, but it was not clear to me.

Can somebody help me with grep (or any other solution) ?

user691197
  • 927
  • 6
  • 20
  • 38

2 Answers2

1

You can use sed to get all these blocks:

$ sed -n '/<tr>/,/<\/tr>/p' input.html
<tr>
 <td> a </td>
</tr>
<tr>
 <td> a </td>
</tr>
<tr>
 <td> a </td>
</tr>
kev
  • 155,172
  • 47
  • 273
  • 272
  • this extracts all the things between the first and the last . if OP wants only blocks, this solution my have problem. e.g. ...... – Kent Mar 05 '12 at 14:02
  • If `` and `` are always on separated lines, this command works. – kev Mar 05 '12 at 14:18
  • forget it.. it is really hard to bring code format in comment. tired to make the comment look better. solve this could be more difficult than the original question.. I give up. – Kent Mar 05 '12 at 14:44
0

See my answer to this previous question. Basically you use greps -z option plus a very specific regex.

Community
  • 1
  • 1
beerbajay
  • 19,652
  • 6
  • 58
  • 75