So I have next code to filter out all urls (just http) from page source (String text)
private synchronized void addLinks(String text) {
String regex = "\\b(http)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Pattern urlPattern = Pattern.compile(regex);
Matcher matcher = urlPattern.matcher(text);
while(matcher.find()) {
int matchStart = matcher.start(1);
int matchEnd = matcher.end();
String urlStr = text.substring(matchStart, matchEnd);
//do something
}
}
}
I need to add some code to the regex in order to match only urls that links to some text pages. Is it possible?