0

I've got a code which treat String as a one tag and extract everything together. In this case: "abc</a> <a>def". How to extract from tags separatedly to obtain two Strings: "abc" and "def"?

public static void main(String[] args) throws Exception {
    Ex.findInTags("<a>((.*))</a>", "<a>abc</a> <a>def</a>");
}
public static void findInTags(String a, String b) {
    Pattern pattern = Pattern.compile(a);
    Matcher matcher = pattern.matcher(b);
    if (matcher.find()) {
        System.out.println(matcher.group(1));
    }
}
padrian92
  • 147
  • 4
  • 16
  • I am not VotingToClose only because I have some doubts, but possibly a duplicate of : http://stackoverflow.com/a/1732454/598289 – SJuan76 Oct 03 '16 at 07:51
  • Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – baudsp Oct 03 '16 at 08:00

1 Answers1

2

Do not use Regex to parse XML/HTML because these are not regular ranguages so regular expressions cannot be used. Use dedicated tools like XPath (for XML) or Jsoup (HTML)

Jsoup.parse("<a>abc</a> <a>def</a>").select("a")

will give you all a elements and u can iterate over it and get required text from each node.

Antoniossss
  • 31,590
  • 6
  • 57
  • 99