0

I am trying to capture a part of a string using RegEx but can't i can capture the string itself but not the following string.

Here is the html source i would like to capture from:

<div class="FindBoxTopL fl_left">
<b>Salary: </b> $10.00 <br>
<b>Location: </b> Wisconsin Madison<br>
<b>Country:</b>United States<br>

<b>Contract Type: </b>Part Time<br><b>Closing Date: </b>August 15, 2014<br>
</div>

From the above html i would like to capture: Wisconsin Madison

So i would match the string Location:\s</b> and then capture the string Wisconsin Madison and stop at the page break.

The end capture output would be: Wisconsin Madison

Can anyone help please?

Mannie Singh
  • 119
  • 3
  • 17

2 Answers2

3

Use the right tool for the job instead of trying to parse HTML using regular expressions. I would take advantage of using the Html Agility Pack which would make this alot easier trying to parse and extract values.

If you still choose to use a regular expression for this, you can use the following:

<b>Location:\s*</b>\s*([^<]*)

Use capturing group #1 to access your match result.

Live Demo

hwnd
  • 69,796
  • 4
  • 95
  • 132
0

Use this regex :

/(?<=Location:\s\<\/b\>\s)(.+?)(?=\<br\>)/g

Explanations :

  • (?<=Location:\s\<\/b\>\s) : lookbehind, your string must follow Location </b>
  • (?=\<br\>) : lookahead, your string must be followed by <br>

Regular expression visualization

Try it !

zessx
  • 68,042
  • 28
  • 135
  • 158