1

My HTML looks like:

<td class="price" valign="top"><font color= "blue">&nbsp;&nbsp;$&nbsp;      5.93&nbsp;</font></td>

I tried:

String result = "";
        Pattern p =  Pattern.compile("\"blue\">&nbsp;&nbsp;$&nbsp;(.*)&nbsp;</font></td>");

        Matcher m = p.matcher(text);

        if(m.find())
            result = m.group(1).trim();

Doesn't seem to be matching.

Am I missing an escape character?

Blankman
  • 259,732
  • 324
  • 769
  • 1,199
  • 3
    Avoid parsing HTML with regular expressions if possible. Use an HTML parser instead. – Mark Byers Apr 17 '10 at 23:49
  • No html parsing using regex please.. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Mahesh Velaga Apr 18 '10 at 00:23

2 Answers2

2

Unless escaped at the regex level, $ means match the end of line. And to get the single \ needed to escape the $ it needs to be escaped in the String literal; i.e. two \ characters. So ...

... Pattern.compile("\"blue\">&nbsp;&nbsp;\\$&nbsp;(.*)&nbsp;</font></td>");

But the folks who commented that you shouldn't use regexes to parse HTML are absolutely right!! Unless you want chronically fragile code, your code should use a strict or non-strict HTML parser.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I tried using HtmlParser, but got stuck so I am going the regex route! – Blankman Apr 18 '10 at 01:17
  • @Blankman - I think you should go back to HtmlParser. Or if the problem is that you have malformed HTML, switch to a non-strict parser like HtmlCleaner. – Stephen C Apr 18 '10 at 01:47
  • here is the htmlParser question: http://stackoverflow.com/questions/2660866/parsing-html-using-htmlparser thanks! – Blankman Apr 18 '10 at 02:49
1

May be you need to escape $ (I think, with two slashes)?

ZyX
  • 52,536
  • 7
  • 114
  • 135