I want to extract links from html, using jsoup
Expected output: absolute link.
I use "abs:href" for that.
This works:
Jsoup.parse("<a \n\r\t href=\"http://www.ibm.com/123/?id=abc\">\nhaha</a>", "http://www.ibm.com");
delivers: http://www.ibm.com/123/?id=abc
This doesnt work:
Jsoup.parse("<a \n\r\t href=\"www.ibm.com/123/?id=abc\">\nhaha</a>", "http://www.ibm.com");
delivers: http://www.ibm.com/www.ibm.com/123/?id=abc
I know its kinda difficult to know whether "www.ibm.com" is an absolute or relative link. It might be a top level domain, but also a foldername. Any proven solutions? Just this hack comes into my mind:
String domain = url.replace("http://", "");
url.replace(domain + domain, domain);