-1

i was hoping someone could help me understand why this happens:

    String s = "tbody\n" +"a\n" +"/tbody";
    Pattern p = Pattern.compile("tbody[^(/tbody)]+/tbody"); 

    Matcher m = p.matcher(s);

    while(m.find()){
        System.out.println("found: \n\n"+m.group());            
    }

Output is:

found: 

tbody

a

/tbody

But if String s = "tbody\n" +"ao\n" +"/tbody" (I added an o after the a) it prints nothing. Can anyone tell me what I am missing?

I'm using NetBeans 7.4.

4J41
  • 5,005
  • 1
  • 29
  • 41
  • `[..]` in a regular expression is a *character class* - now you know the name, look it up :) In any case, consider just using a *non-greedy/lazy quantifier*: `tbody(.*?)/tbody` (you may also be interested in *word boundaries*). – user2864740 Jan 21 '14 at 22:27
  • You seem to be trying to figure out how to parse HTML with regular expressions. This is a non-starter, since HTML is not a regular language. Please read [this answer](http://stackoverflow.com/a/1732454/18157) – Jim Garrison Jan 21 '14 at 22:46
  • @JimGarrison i'm not sure what i'm trying to do is parsing. I need to collect info from a specific website, wich lies between those tags. – user2847339 Jan 22 '14 at 03:17
  • You'll be much better off if you use a real HTML parser like JSoup – Jim Garrison Jan 22 '14 at 04:01

1 Answers1

1

The [^(/tbody)] is not what you thought it is. It does not mean any string which is not /tbody. Instead it negates each char one by one. Now /tbody contains o and you added an o (so you have that o negated). That's why it does not match any more.

Try adding x instead of o and it will keep working (as x is not among the chars you negated).

peter.petrov
  • 38,363
  • 16
  • 94
  • 159