0

I like to know how to write a regex for the following code.

<a href="/search?q=user:111111+[apple]" class="post-tag" title="show all posts by this user in 'apple'">Apple</a><span class="item-multiplier">&times;&nbsp;171</span><br>

I just need to fetch Apple from the above source code.

halfer
  • 19,824
  • 17
  • 99
  • 186
giri
  • 26,773
  • 63
  • 143
  • 176
  • 6
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – kennytm Mar 24 '10 at 18:23
  • KennyTM, while it's true that you can't parse HTML using regex, this question is not about parsing HTML but rather parsing an a tag, which is certainly possible. – Omry Yadan Mar 24 '10 at 18:29

1 Answers1

1

There is an excellent tool at txt2re that can be used to EASILY generate regexp in various languages. I used it to generate the following:

import java.util.regex.*;

class Main
{
  public static void main(String[] args)
  {
    String txt="<a href=\"/search?q=user:111111+[apple]\" class=\"post-tag\" title=\"show all posts by this user in 'apple'\">Apple</a><span class=\"item-multiplier\">&times;&nbsp;171</span><br>";

    String re1=".*?";   // Non-greedy match on filler
    String re2="(?:[a-z][a-z]+)";   // Uninteresting: word
    String re3=".*?";   // Non-greedy match on filler
    String re4="(?:[a-z][a-z]+)";   // Uninteresting: word
    String re5=".*?";   // Non-greedy match on filler
    String re6="(?:[a-z][a-z]+)";   // Uninteresting: word
    String re7=".*?";   // Non-greedy match on filler
    String re8="((?:[a-z][a-z]+))"; // Word 1

    Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    Matcher m = p.matcher(txt);
    if (m.find())
    {
        String word1=m.group(1);
        System.out.print("("+word1.toString()+")"+"\n");
    }
  }
}
Omry Yadan
  • 31,280
  • 18
  • 64
  • 87