6

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

Excerpt From Input File

<TD class="clsTDLabelWeb" width="28%">Municipality:&nbsp;</TD>
<TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5">
<span id="DInfo1_Municipality">JUPITER</span></TD>

My Regular Expression

(?<=<span id="DInfo1_Municipality">)([^</span>]*)

I have an HTML file saved to disk. I would like to use grep to search through the file and output the contents of a specific span, though I don't know if this is a proper use of grep. When I run grep on the file with the expression read from another file (so I dont mess up escaping any special characters), it doesn't output anything. I have tested the expression in RegExr and it matches "JUPITER" which is exactly what I want returned. Thank you so much for your help!

Desired Output

JUPITER
Community
  • 1
  • 1
LakeMicrobe
  • 61
  • 1
  • 1
  • 2
  • 2
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – msw Aug 29 '10 at 01:04

3 Answers3

3

Give this a try:

sed -n 's|^<span id="DInfo1_Municipality">\([^<]*\)</span></TD>$|\1|p' file

or with GNU grep and your regex:

grep -Po '(?<=<span id="DInfo1_Municipality">)([^</span>]*)'
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • Passing the -P flag to grep allowed me to do the lookbehind/lookahead I was after. Thanks for the tip – james Jan 10 '13 at 18:10
1

Grep doesn't support that type of regex (lookbehind assertions), and its a very poor tool for this, but for the example given it is workable, will break under many situtions.

grep -io "<span id=\"DInfo1_Municipality\">.*</span>" file.htlm | grep -io ">[^<]*" | grep -io [^>]*

something crazy like that, not a good idea.

Paul Creasey
  • 28,321
  • 10
  • 54
  • 90
  • Thank you. I had a feeling I was trying to do something that grep didn't support. It looks like I'll have to do some actual programming. – LakeMicrobe Aug 29 '10 at 01:15
1
sed -n '/DInfo1_Municipality/s/<\/span.*//p' file | sed 's/.*>//'
ghostdog74
  • 327,991
  • 56
  • 259
  • 343