i try to parse a document with jsoup (java). This is my java-code:
package test;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class crawler{
private static final int TIMEOUT_IN_MS = 5000;
public static void main(String[] args) throws MalformedURLException, IOException
{
Document doc = Jsoup.parse(new URL("http://www.internet.com/"), TIMEOUT_IN_MS);
System.out.println(doc.html());
}
}
Ok, this works. But when i want to parse a https site, i get this error message:
Document doc = Jsoup.parse(new URL("https://www.somesite.com/"), TIMEOUT_IN_MS);
System.out.println(doc.html());
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=https://www.somesite.com/ at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216) at org.jsoup.Jsoup.parse(Jsoup.java:183) at test.crawler.main(crawler.java:14)
I only get this error messages, when i try to parse https. http is working.