0

This Pattern is not working,But when I test in a tool for Regex Test,the pattern can match,why so strange? Hope someone can help me.

Code like this:

    String html = " <div class=\"c-gap-bottom-small\"> <a href=\"http://www.baidu.com/link?url=uAUkCOuk7A6EucqsYf4iZ8Dr6wD8zMwAwh8V8exRH-fIt_LLbAsV-6344l7KQAU6apfnoznw-bDv3LZWcADWy_\" target=\"_blank\" > 诺基亚推<em>Android</em>手机证据确凿! </a> </div> <div class=\"c-row c-gap-bottom-small\">  <a href=\"http://www.baidu.com/link?url=uAUkCOuk7A6EucqsYf4iZ8Dr6wD8zMwAwh8V8exRH-fIt_LLbAsV-6344l7KQAU6apfnoznw-bDv3LZWcADWy_\" target=\"_blank\" class=\"op_sp_realtime_preBox c-span6\" data-click=\"{'title':'android的最新相关信息'}\"> <img src=\"http://t11.baidu.com/it/u=3675500557,4147720515&fm=55\" class=\"c-img c-img6\" /> </a>  <div class=\"c-span-last\">   虽然这不能保证诺基亚的<em>Android</em>手机即将发布,但至少这证明了诺基亚正在测试<em>Android</em>手机,大家期待吗? 你期待诺基亚的<em>Android</em>手机吗? 1.你期待诺基亚...   <br /><span style=\"color:#008000\">驱动之家</span>   &nbsp;<span style=\"color:#666;\">3小时前</span>  </div> </div>           <div class=\"c-row\">   <span style=\"color:#666;float:right\">2天前</span>  <a href=\"http://www.baidu.com/link?url=TYqV2ZEzCcBgcqX-GRxZGEJGnq8r266exUHm54Mpgsc202Qp6PJL9cvZalaRBbWPJXuryLpOdhbGbANjZZFODq\" target=\"_b";
    Pattern p = Pattern
            .compile("<div class=\"c-gap-bottom-small\">(.*?)</div>",
                    Pattern.DOTALL);
    Matcher match = p.matcher(html);
    if (match.matches()) {
        Log.i("match", match.group(1));
    }
liukuo362573
  • 39
  • 10

1 Answers1

0

matches() will try to match the entire input, not just whether a part of the input matches the regex. To find within-input matches, use find().

On the other hand, attempting to parse HTML-like grammars with regex is destined to failure. Use a real tagsoup parser like JSoup instead.

Community
  • 1
  • 1
laalto
  • 150,114
  • 66
  • 286
  • 303