I am writing a small java program for a class, and I can't quite figure out why my regex isn't working properly. In the special case of having 2 tags on the same line that is read in, it only matches the second one.
Here is a link that has the regex included, along with a simple set of test data: Regex Test Link.
In my java program I have the following code:
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
String[] results;
System.out.println(p.toString());
Matcher m = null;
while((line = input.readLine()) != null) {
m = p.matcher(line);
while(m.find()) {
System.out.println("Matches: " + m.group(1));
}
}
The goal is to extract the href value, as long as it starts with http://, the website ends in either no page (like http://www.google.com) or ends in index.htm or index.html (like http://www.google.com/index.html).
My regex works for every case of the above, but doesnt match in the special case of 2 tags that are on the same line.
Any help is appreciated.