I have a regexp to extract an id and a label out of an HTML source code. It can be found HERE.
As you can see it work fine and its fast but when i try this regexp in java with the same source code it 1. Takes for ever and 2. only matches one string (from the first a
to the last a
is one match).
I tried it with the Multiline
flag on and off but no difference. I don't understand how a regexp can work everywhere but in java. Any ideas?
private static final String COURSE_REGEX = "<a class=\"list-group-item list-group-item-action \" href=\"https:\\/\\/moodle-hs-ulm\\.de\\/course\\/view\\.php\\?id=([0-9]*)\"(?:.*\\s){7}<span class=\"media-body \">([^<]*)<\\/span>";
Pattern pattern = Pattern.compile(COURSE_REGEX, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(sourceCode);
List<String> courses = new ArrayList<>();
while(matcher.find() && matcher.groupCount() == 2){
courses.add(matcher.group(1) + "(" + matcher.group(2) + ")");
}