0

I'm trying to scrape google with jsoup and I'm scraping it every 10 seconds, but it's giving me the "org.jsoup.HttpStatusException: HTTP error fetching URL. Status=429" after scraping it for a while, which means I'm making too many requests, but I'm only scraping it once every 10 seconds.

Now, whenever I try to scrape it, it's returning "null", which means it's giving me the too many request error and won't let me scrape anymore. I even tried waiting 10 minutes before trying to scrape again, but it's still giving the too many request error. How would I fix this?

MainActivity.java:

public String getContent(String link) throws InterruptedException, IOException {

        tuna tuna = new tuna(link);
        Thread thread = new Thread(tuna);
        thread.start();
        thread.join();
        String value = tuna.getValue();

}

String link = "www.google.com";
string content = getContent(link);

tuna.java:

public class tuna implements Runnable {


    String link;
    Document doc;
    String content;
    public tuna (String x) throws IOException {
    link = x;
    }

    public void run() {

        try {
            doc = Jsoup.connect(link).get();
            content = doc.html();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public String getValue() {
        String returnContent = content;
        return returnContent;
    }


}
frosty
  • 2,559
  • 8
  • 37
  • 73
  • You could try making fewer requests...? Scraping any website (google.com included) every 10 seconds is abusive IMO. – Sean Bright Jan 09 '20 at 22:16
  • @SeanBright Ok, but what do I do now? When will it let me scrape again? Because it's been like 20 minutes already. – frosty Jan 09 '20 at 22:19
  • @SeanBright How many request does google let you make per minute without throwing the too many request error? – frosty Jan 09 '20 at 22:20
  • Does this answer your question? [How to avoid HTTP error 429 (Too Many Requests) python](https://stackoverflow.com/questions/22786068/how-to-avoid-http-error-429-too-many-requests-python) – Sean Bright Jan 09 '20 at 22:21
  • @SeanBright isn't that python? I'm using java with android studios. – frosty Jan 09 '20 at 22:22
  • The answer is still applicable. The error you are getting and how to handle it has nothing to do with the language you have chosen. – Sean Bright Jan 09 '20 at 22:23
  • @SeanBright The top voted answer literally just says "keep on waiting" and the other answers are written in python. – frosty Jan 09 '20 at 22:25
  • And then it goes on to talk about the `Retry-After` header. This is my last response on this topic. – Sean Bright Jan 09 '20 at 22:27
  • @SeanBright There aren't any retry-after header errors in java in android studios. Also it's been like 40 minutes, and I still can't scrape it without it return the error. I even tried doing "ctrl+f" to find if it's an error call retry-after, in the error log and there's nothing. – frosty Jan 09 '20 at 22:31

0 Answers0