0
String HTML = some HTML source code that contains String a,b

String a = "<a class="cit-dark-link" href="http://scholar.google.ca/scholar?oi=bibs&hl=en&cites=6912391300348162186">88</a>"

String b = "<a class="cit-dark-link" href="http://scholar.google.ca/scholar?oi=bibs&hl=en&cites=18217435431424551679">41</a>"

String ex = ?

Pattern patternObject = Pattern.compile(ex);
Matcher matcherObject = patternObject.matcher(HTML);

while (matcherObject.find()) {
        System.out.println("DEBUG: Cite is " + matcherObject.group(1));
  }

Hi, I am new to JAVA and Regex and I am wondering how can I write the String ex so that it only prints. (I hope I am clear enough)

Cite is 88

Cite is 41

DwB
  • 37,124
  • 11
  • 56
  • 82
user116064
  • 67
  • 2
  • 8

2 Answers2

0
String ex = ".*>([1-9]+)<.*";

If you only want the digits, you can ignore everything else. I don't know how you apply URL to HTML, but this test is for one URL from user input.

public static void main(String[] args) throws IOException {
    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
    String HTML = in.readLine();
    URL a = new URL (HTML);

    String ex = ".*>([1-9]+)<.*";

    Pattern patternObject = Pattern.compile(ex);
    Matcher matcherObject = patternObject.matcher(a.toString());

    while (matcherObject.find()) {
            System.out.println("DEBUG: Cite is " + matcherObject.group(1));
      }
}
MeowMeow
  • 622
  • 1
  • 8
  • 15
  • How do i ignore the 6912391300348162186? Because there can be different string with different digits. – user116064 Aug 10 '14 at 13:40
  • .*> should ignore everything prior to the group right after the >, I'll update my answer with the test code. – MeowMeow Aug 10 '14 at 13:42
0

You can try this :

Pattern patternObject = Pattern.compile("<a class=\"cit-dark-link(.*?)cites=(\\d)+\">(.*?)</a>");
            Matcher matcherObject = patternObject.matcher(HTML);

            while (matcherObject.find()) {
                    System.out.println("DEBUG: Cite is " + matcherObject.group(3));
              }

This prints :

DEBUG: Cite is 88
DEBUG: Cite is 41
user3487063
  • 3,672
  • 1
  • 17
  • 24