0

I could really use some help. I have the following two lines.

<TD noWrap>Data: <B><SPAN class="TableBody clsBold">4</SPAN></B></TD>
<TD noWrap>Format: <B><SPAN class="TableBody clsBold">9</SPAN>/<SPAN class=TableBody> </SPAN></B></TD>

I need to grab the text between the tags (4 and 9 respectively)

I'm using the following regex statement:

(\s)*(<B>)*<(?<SPAN>\w*)(?:.*)>(?:.*)</\k<SPAN>>

This works great for the first line as in:

Data: (\s)*(<B>)*<(?<SPAN>\w*)(?:.*)>(?:.*)</\k<SPAN>>

But doesn't work with

Format: (\s)*(<B>)*<(?<SPAN>\w*)(?:.*)>(?:.*)</\k<SPAN>>

because it grabs the SPAN after the section I need, so I don't get the 9 in group 4.

How do I get what I need using the same Regex line with different prefix.

nickb
  • 59,313
  • 13
  • 108
  • 143
Jordan
  • 21,746
  • 10
  • 51
  • 63
  • 5
    Using regex to parse HTML is usually a bad idea. Have you considered alternatives such as an HTML parser? – Mark Byers Jun 02 '12 at 23:21
  • 1
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. That said, what programming language are your driving your regex with? we'll recommend a better option to more reliably parse stuff out from HTML – Pavel Veller Jun 02 '12 at 23:27

1 Answers1

0

I would specify exactly what it is I'm looking for in the regex. That being said, something like this should suffice:

<TD noWrap>([A-Za-z]+:)\s*<B><SPAN class="TableBody clsBold">(\d+)</SPAN>

It only uses two capturing groups, so you should get as matches ("Data", "4"), and ("Format", "9")

nickb
  • 59,313
  • 13
  • 108
  • 143
  • ;( There are a thousand prefixes in the file and I only need a few of them! So I toss the few that I need in an array use as prefix to the regex. I get the lines that I need but not the grouping for the second (Format) line. – Jordan Jun 02 '12 at 23:38
  • 1
    Actually, on second thought! This works out, because I can just cycle through all formatted lines. Brilliant! You gotta love SO! – Jordan Jun 02 '12 at 23:41
  • How would I get this regex from picking up the second set of SPAN Tags: Zip Code: 06902- 2630 – Jordan Jun 03 '12 at 02:03
  • Do you need it to capture the value within the second set of span tags, or no? – nickb Jun 03 '12 at 22:11
  • No. Don't need second value. But I believe I figured it out. – Jordan Jun 04 '12 at 10:46