1

I have a string variable which I would like to extract the title value in id="resultcount" element. The output should be 2.

var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';

I tried the following regex but it is not working:

/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/
ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
kate.g
  • 11
  • 1
  • 2
    To refer you to some epicness: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – stefanw Nov 16 '10 at 20:17

4 Answers4

3

Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:

$('#resultcount', '<div>'+str+'</div>').attr('title')

It will return undefined if resultcount is not found or it has not a title attribute.

Leo Lobeto
  • 364
  • 3
  • 14
1

To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:

var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")

Then find title from previous string:

var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2

But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.

Igor Jerosimić
  • 13,621
  • 6
  • 44
  • 53
0

Please see this earlier response, entitled "You can't parse [X]HTML with regex":

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
pndc
  • 3,710
  • 2
  • 23
  • 34
0

Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.

span id="resultcount" title="(\d+)"

The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.

Keng
  • 52,011
  • 32
  • 81
  • 111