1

I'm receiving HTML code from XML and trying to find last in Java. when I'm running the code I always receive the first span and the groupCount show me that there is only one match (the first one). I also tried to use a hardcode version of the XML (I created a string variable and still got the same result)

here is my code:

String text = "<div><ul ><li><span>answer 1.</span></li><li><span>answer 2</span></li><li><span>answer3.</span></li><li><span>answer 4</span></li></ul><div><span>Cat 1 | Cat 2 | Cat 3</span></div></div>"
    Pattern pattern3 = Pattern.compile("<span.*?(?=</span>)");
    Matcher matcher3 = pattern3.matcher(desc);
    if (matcher3.find()) {
        int result = matcher3.groupCount();
        String s = (matcher3.group(result))//->>always show the first result 
    }
nquincampoix
  • 508
  • 1
  • 4
  • 17
Shira
  • 15
  • 2
  • 6
  • 1
    possible duplicate of [find the last match with java/regex/matcher](http://stackoverflow.com/questions/6417435/find-the-last-match-with-java-regex-matcher) – Super Hornet Sep 05 '15 at 09:32
  • You're using `groupCount()` incorrectly. What it tells you is how many capturing groups there in the regex. That's a static property of the Pattern object; it doesn't tell you anything about what was matched. Your regex has no groups, so `groupCount()` always returns zero. – Alan Moore Sep 05 '15 at 09:42
  • Do not parse HTML using regular expressions. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Andreas Sep 05 '15 at 09:50
  • 1
    @Andreas He is not parsing HTML but just finding a particular string within a larger string, which IMHO is fine. Also, we know that post, [you can stop linking](http://meta.stackoverflow.com/questions/261561/please-stop-linking-to-the-zalgo-anti-cthulhu-regex-rant). – tobias_k Sep 05 '15 at 10:06

4 Answers4

3

You can call matcher.find again and it wll find the next match. It does not only tell you whether it found anything, it also actively searches for the next match. After you called it once, you only get the first match. When you call it again, next time you call matcher.group you get the second match, and so on. Repeat until it finds nothing, than take the last result. Also, you do not really need groupCount, as there is always the same number of groups in the match -- zero.

String text = "<div><ul ><li><span>answer 1.</span></li><li><span>answer 2</span></li><li><span>answer3.</span></li><li><span>answer 4</span></li></ul><div><span>Cat 1 | Cat 2 | Cat 3</span></div></div>";
Pattern pattern3 = Pattern.compile("<span.*?(?=</span>)");
Matcher matcher3 = pattern3.matcher(text);
String s = null;
while (matcher3.find()) {
    s = matcher3.group();
} 
System.out.println(s);

Output is <span>Cat 1 | Cat 2 | Cat 3.

If you want to use just what's within the <span> tags, you can use regex "<span>(.*?)</span>" and matcher3.group(1) to get what's within the first pair of () (or put the tags in lookahead and lookbehind, but IMHO it's easier this way).

tobias_k
  • 81,265
  • 12
  • 120
  • 179
1

Though you have asked for finding last occurence using regex. But also consider using jsoup which is java tested library for html parser. Its already tested and good from readability point of view

See finding last occurrence using jsoup

M Sach
  • 33,416
  • 76
  • 221
  • 314
0

Use a greedy quantifier * with . to find the last occurence.

(?s)^.*<span[^>]*>(.*?)</span>

Captures of first group matcher3.group(1) regexplanet demo

Pshemo
  • 122,468
  • 25
  • 185
  • 269
-1

try this:

String text = "<div><ul ><li><span>answer 1.</span></li><li><span>answer 2</span></li><li><span>answer3.</span></li><li><span>answer 4</span></li></ul><div><span>Cat 1 | Cat 2 | Cat 3</span></div></div>"
    Pattern pattern3 = Pattern.compile("<span.*?(?=</span>)");
    Matcher matcher3 = pattern3.matcher(text);
    if (matcher3.find()) {
        String in= matcher3.group(matcher3.groupCount()); 
    }
Super Hornet
  • 2,839
  • 5
  • 27
  • 55