Jsoup SocketTimeoutException: Read timed out

Question

I get a SocketTimeoutException when I try to parse a lot of HTML documents using Jsoup.

For example, I got a list of links :

<a href="www.domain.com/url1.html">link1</a>
<a href="www.domain.com/url2.html">link2</a>
<a href="www.domain.com/url3.html">link3</a>
<a href="www.domain.com/url4.html">link4</a>

For each link, I parse the document linked to the URL (from the href attribute) to get other pieces of information in those pages.

So I can imagine that it takes lot of time, but how to shut off this exception Here is the whole stack trace:

java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at java.net.HttpURLConnection.getResponseCode(Unknown Source)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
    at app.ForumCrawler.crawl(ForumCrawler.java:50)
    at Main.main(Main.java:15)

The code you added in your edit sets the timeout to infinite. This is undesirable in most use cases. It is much better to use a specific timeout as indicated in MarcoS answer, even if the timeout is long. — stepanian, Dec 20 '14 at 22:44
I guess the `timeout(0)` will make Jsoup connect the url again and again until it connect. — Evan Hu, Jul 12 '15 at 02:45
This seems to be a solution found by Question author [C. Maillard](https://stackoverflow.com/users/817143) `Jsoup.connect(url).timeout(0).get();` as per [earlier revision](https://stackoverflow.com/revisions/6571548/6) — Scratte, Sep 13 '20 at 11:50

score 147 · Answer 1 · edited Nov 20 '18 at 11:46

147

I think you can do

Jsoup.connect("...").timeout(10 * 1000).get();

which sets timeout to 10s.

edited Nov 20 '18 at 11:46

frogatto

28,539
11
83
129

answered Jul 04 '11 at 12:40

MarcoS

13,386
7
42
63

5

121 upvotes but no explanation of why this fixes the issue? Why does that address the problem when the default is, it would appear, 30 seconds? – Alan Hay Nov 02 '17 at 09:17
2

@AlanHay my answer was suggesting to solve the problem by setting a timeout, not by using that specific value as a timeout :) – MarcoS Nov 03 '17 at 15:32
2

@AlanHay, the default timeout when the Q & A were written was 3 seconds. So any increase would have lowered the socket timeout frequency and helped to fix the issue. I updated the default to 30 seconds in 2016. – Jonathan Hedley Jan 09 '21 at 21:27

amaidment · Answer 2 · 2017-03-24T09:31:50.777

27

Ok - so, I tried to offer this as an edit to MarcoS's answer, but the edit was rejected. Nevertheless, the following information may be useful to future visitors:

According to the javadocs, the default timeout for an org.jsoup.Connection is 30 seconds.

As has already been mentioned, this can be set using timeout(int millis)

Also, as the OP notes in the edit, this can also be set using timeout(0). However, as the javadocs state:

A timeout of zero is treated as an infinite timeout.

edited Mar 24 '17 at 09:31

answered Oct 18 '12 at 14:38

amaidment

6,942
5
52
88

3

Setting an infinite timeout is a bad idea in most cases. Use a long timeout, but always specify one. See MarcoS answer. – stepanian Dec 20 '14 at 22:46
3

@stepanian - to be clear, I'm not advocating setting an infinite timeout. This had been suggested as the solution by the OP, although I wanted to direct future users to the implications of this. Indeed, when I originally posted my 'answer', I indicated that I thought it should have been an edit to MacroS's answer, as there was some additional information that might be useful to future users... but the edit was rejected. – amaidment Jan 06 '15 at 18:51
The default timeout is not 3 seconds, but 30 seconds (30000 millis), you can see it in https://jsoup.org/apidocs/org/jsoup/Connection.html – aldok Mar 04 '17 at 13:34
The timeout used to be 3 seconds, back when the question was written. – Jonathan Hedley Jan 09 '21 at 21:28

score 4 · Answer 3 · answered Jan 11 '19 at 13:05

I had the same error:

java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)

and only setting .userAgent(Opera) worked for me.

So I used Connection userAgent(String userAgent) method of Connection class to set Jsoup user agent.

Something like:

Jsoup.connect("link").userAgent("Opera").get();

score 3 · Answer 4 · answered Feb 06 '18 at 14:36

3

There is mistake on https://jsoup.org/apidocs/org/jsoup/Connection.html. Default timeout is not 30 seconds. It is 3 seconds. Just look at javadoc in codes. It says 3000 ms.

answered Feb 06 '18 at 14:36

Bartek

45
1

1

On java doc: "The default timeout is 30 seconds (30,000 millis). A timeout of zero is treated as an infinite timeout." https://jsoup.org/apidocs/org/jsoup/Connection.html – jeton May 22 '18 at 16:22

score -1 · Answer 5 · edited Jun 27 '19 at 06:27

-1

This should work: Jsoup.connect(url.toLowerCase()).timeout(0);.

edited Jun 27 '19 at 06:27

Masoud Rahimi

5,785
15
39
67

answered Jun 27 '19 at 06:01

Prasanna Mendon

39
5

score -6 · Answer 6 · answered Dec 19 '13 at 04:48

-6

Set timeout while connecting from jsoup.

answered Dec 19 '13 at 04:48

Gaurab Pradhan

281
1
5
14

2

Please add futher info about your answer – Joe Taras May 11 '16 at 13:21
Please support your answer with explanations and code snippets if necessary. – Swapnil B. Sep 04 '18 at 22:28

Jsoup SocketTimeoutException: Read timed out

6 Answers6

Linked

Related