0

I am getting the compile time error.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class gfile
 {
  public static void main(String args[]) {
    // create a Pattern
    Pattern p = Pattern.compile("<div class="dinner">(.*?)</div>");//some prob with this line


    // create a Matcher and use the Matcher.group() method
  String can="<tr>"+
                          "<td class="summaryinfo">"+

                                "<div class="dinner">1,000</div>" +
                                "<div style="margin-top:5px " +
                                 "font-weight:bold">times</div>"+
                            "</td>"+
                        "</tr>";

    Matcher matcher = p.matcher(can);
    // extract the group

    if(matcher.find())
     {
    System.out.println(matcher.group());
     }
  else
     System.out.println("could not find");
  }
}
giri
  • 26,773
  • 63
  • 143
  • 176
  • 2
    Please let us know what you expect it to do, and in what way it is failing: what unexpected result are you seeing. – Jacob Mattison Mar 10 '10 at 20:19
  • Be more specific. Where is it "wrong"? Does it compile? Does it crash? Does it produce the "wrong" result? – CaffGeek Mar 10 '10 at 20:19
  • 8
    What's wrong is that it's using regex on html – James Kolpack Mar 10 '10 at 20:20
  • A big +1 for James Kolpack! See http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html – TrueWill Mar 10 '10 at 20:22
  • @girinie, then it's because you didn't escape your quotes. See ericofsac's answer – CaffGeek Mar 10 '10 at 20:26
  • @James Kolpack: HTML is not a regular language. Only parts of it are regular. So you cannot use a regular expression to process it. – Gumbo Mar 10 '10 at 20:26
  • Mandatory link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Javier Mar 10 '10 at 20:30
  • @girinie - using an XPath expression, an equivalent query would look like //div[@class="dinner"]/text() - it does the matching against the structure of the tags instead of raw text. Much easier to write and maintain. – James Kolpack Mar 10 '10 at 21:23

5 Answers5

7

You have unescaped quotes inside your call to Pattern.compile.

Change:

Pattern p = Pattern.compile("<div class="dinner">(.*?)</div>");

To:

Pattern p = Pattern.compile("<div class=\"dinner\">(.*?)</div>");

Note: I just saw the same problem in your String can.

Change it to:

  String can="<tr>"+
                      "<td class=\"summaryinfo\">"+

                            "<div class=\"dinner\">1,000</div>" +
                            "<div style=\"margin-top:5px " +
                             "font-weight:bold\">times</div>"+
                        "</td>"+
                    "</tr>";

I don't know if this fixes it, but it will at least compile now.

Eric G
  • 4,018
  • 4
  • 20
  • 23
1

But, your Regex is matching (.*?) "Any character, any number of repetitions, as few as possible"

Meaning, it matches nothing...and everything.

...or the fact that your quotes aren't escaped.

CaffGeek
  • 21,856
  • 17
  • 100
  • 184
  • The (.*?) is valid. It matches zero or more characters of any value between the given two
    elements. The "?" just makes that non-greedy so it locates the very next `
    `.
    – Kevin Brock Mar 11 '10 at 13:06
0

You should use an HTML parser to parse and process HTML - not a regular expression.

Shlomi Fish
  • 4,380
  • 3
  • 23
  • 27
  • 1
    This doesn’t answer the question. – Gumbo Mar 10 '10 at 20:25
  • @allthosewhocomplain about Regexes with HTML, XML, etc. It's bad if you have it unstructured, and the formatting can change. BUT, if you KNOW the format, and KNOW it won't have surprises, there is no reason not to use a Regex. Just because it's often not the correct course, doesn't mean it never is. – CaffGeek Mar 10 '10 at 20:35
  • @Gumbo: The only question I see up there is "Can anybody tell me what's wrong with this code?". Assuming that we want to answer the implied questions ("What's wrong with this code?") other than a simple "Yes" answer, this is a perfectly good answer. The code is trying to parse and process HTML with regexes, and that is fundamentally what's wrong with it. – David Thornley Mar 10 '10 at 20:49
  • @David Thornley: girinie said, he/she is getting a compiler error. And using regular expressions to process HTML doesn’t cause compiler errors. – Gumbo Mar 10 '10 at 21:09
  • @Gumbo: I know. There is a minor thing wrong with the code that is causing a compiler error. The major thing wrong with it is that it uses regexes to parse HTML. I'm serious about this: telling people about minor flaws in trying to do the wrong thing isn't really helpful. – David Thornley Mar 10 '10 at 21:17
  • @David Thornley: girinie is asking what is causing the compiler error. Answering with “don’t use regular expressions to process HTML” is *not* an answer to *that* question. That’s what the comments are for. – Gumbo Mar 10 '10 at 21:28
0

As already pointed out, you'll need to escape the double quotes inside all of your strings.

And, if you want to have "1,000" as result, you'll need to use group(1), else you'll get the complete match of the pattern.

Resulting code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class gfile
 {
  public static void main(String args[]) {
    // create a Pattern
    Pattern p = Pattern.compile("<div class=\"dinner\">(.*?)</div>");

    // create a Matcher and use the Matcher.group() method
    String can="<tr>"+
                          "<td class=\"summaryinfo\">"+

                                "<div class=\"dinner\">1,000</div>" +
                                "<div style=\"margin-top:5px " +
                                 "font-weight:bold\">times</div>"+
                            "</td>"+
                        "</tr>";

    Matcher matcher = p.matcher(can);

    if(matcher.find())
    {
       System.out.println(matcher.group(1));
    }
    else
       System.out.println("could not find");
  }
}
orithena
  • 1,455
  • 1
  • 10
  • 24
-1

(.*?) might need to be (.*)?

Seaux
  • 3,459
  • 2
  • 28
  • 27