-3

I wanna find a links on webpage(saved in String variable) by regexp. Especially defined by tag:"<a href=”link”></a>". (Starts with <a href= and ends </a>) How should looks like this regex, and what should I type into ??? field. TIA ;)

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Main {
public static void main(String[] args) {

    String sourceOfHtml = "Some html code of webpage with links";


    regexChecker("???", sourceOfHtml);


}

public static void regexChecker(String theRegex, String str2check){

    Pattern checkRegex = Pattern.compile(theRegex);

    Matcher regexmatcher = checkRegex.matcher(str2check);

    while(regexmatcher.find()){
        if(regexmatcher.group().length()!=0){
            System.out.println(regexmatcher.group().trim());

        }


    }
}

}

sejseen
  • 3
  • 1

2 Answers2

1

Don't use regex for this. Do use an HTML parser.

Document document = Jsoup.parse(sourceOfHtml);
Elements links = document.select("a[href]");
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
0

You can try this regex :)

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/