3

This my first post here so don't hesitate to tell me if I'm doing something wrong.

I'm made a few line of code in Java 11 to get information from a webstore with JSOUP (1.14.2). Since the webstore as multiple page of data I'm using a loop to get all the url I want.

Here's a simplified exemple of what I'm doing :

for (int i = 1; i < 36; i++) {
    String url = ("https://www.play-in.com/rachat/hotlist/magic?p=" + i);

    try {
        doc = Jsoup.connect(url).get();
    } catch (Exception e) {
        logger.info("Impossible de récuppérer les éléments de la page " + i + " : " + e);
    }

    //  here i'm parsing the HTML to return an array of object
}

When I run the programme I get :

[main] INFO  service.MagicBazarReader  -  Failed to get data from page 2 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[main] INFO  service.MagicBazarReader  -  Failed to get data from page 3 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[main] INFO  service.MagicBazarReader  -  Failed to get data from page 4 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[main] INFO  service.MagicBazarReader  -  Failed to get data from page 5 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[main] INFO  service.MagicBazarReader  -  Failed to get data from page 6 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[...]
[main] INFO  ISmellProfits  - Number of result after HTML parsing : 24

And so on. So the first get() is always a sucess and I can manipulate the result but then it seems to have an issued when calling multipe Jsoup.connect().

Since I'm calling an HTTPS url my first thought was a certificate issue and I tried this solution How to connect via HTTPS using Jsoup? but it didn't helped. And if a certificate was really needed I shouldn't be able to have acces to the url the first time, but I might be wrong here since I don't know a lot on this domain.

Second thought was to use parallel stream :

List <String> links = new ArrayList<>();
for (int i = 1; i < 36; i++) {
    String url = ("https://www.play-in.com/rachat/hotlist/magic?p=" + i);
    links.add(url);
}

links.parallelStream().forEach(link - > {
    Document doc = new Document("");
    try {
        doc = Jsoup.connect(link).get();

        // here i'm parsing the HTML to return an array of object 
    } catch (Exception e) {
        logger.info("Impossible de récuppérer les éléments de la page " + link.substring(link.length() - 2) + " : " + e);
    }
});

I have better results but it's still not perfect :

[ForkJoinPool.commonPool-worker-17] INFO  service.MagicBazarReader  - Failed to get data from page 12 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[ForkJoinPool.commonPool-worker-23] INFO  service.MagicBazarReader  -  Failed to get data from page 30 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[ForkJoinPool.commonPool-worker-9] INFO  service.MagicBazarReader  -  Failed to get data from page 5 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[ForkJoinPool.commonPool-worker-27] INFO  service.MagicBazarReader  -  Failed to get data from page 35 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[main] INFO  service.MagicBazarReader  -  Failed to get data from page 22 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[ForkJoinPool.commonPool-worker-3] INFO  service.MagicBazarReader  -  Failed to get data from page 1 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[ForkJoinPool.commonPool-worker-7] INFO  service.MagicBazarReader  -  Failed to get data from page 2 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[ForkJoinPool.commonPool-worker-21] INFO  service.MagicBazarReader  -  Failed to get data from page 17 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[ForkJoinPool.commonPool-worker-5] INFO  service.MagicBazarReader  -  Failed to get data from page 31 : javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure  
[...]
[main] INFO  ISmellProfits  - Number of result after HTML parsing : 286

So I'm getting a lot more results after HTML parsing, but they are not consistent since i have a different number on every run and i'm still getting SSLHandshakeException.

I'm getting out of idea so I'm asking if someone know what is causing the exception to be thrown.
I'm new to using JSOUP so I still don't know it well. I think it could be that JSOUP can only have on connection at a time an the loop is calling the new one before the first one is closed.

Thanks for reading.

Janez Kuhar
  • 3,705
  • 4
  • 22
  • 45
Pulco
  • 41
  • 4
  • 1
    You should close your connection after each end of loop. Maybe your site doesn't allow more than 1 connection per client. – Pilpo Sep 21 '21 at 14:05
  • I just ran your code locally, I could successfully get all 35 pages. Can it be your internet connection that would be unstable? Are you sure the exceptions are thrown by the code you shared? Or can the "here i'm parsing the HTML to return an array of object" section be the culprit? – sp00m Sep 21 '21 at 14:07
  • 1
    @Pilpo Jsoup closes the connection by its own, after the request is done. – Pulco Sep 21 '21 at 15:09
  • @sp00m which code did you use ? i did some debugging on Jsoup.connect().get() to be sure and it does throw Method threw 'javax.net.ssl.SSLHandshakeException' exception : Received fatal alert: handshake_failure. But i will try on another internet connection – Pulco Sep 21 '21 at 15:11
  • @Pulco I ran https://pastebin.com/HBSfNX6V and all went good (Java 11 with Jsoup 1.14.2). – sp00m Sep 21 '21 at 15:14
  • Edit : i tried on another connection and i have the same failure – Pulco Sep 21 '21 at 15:17
  • @sp00m i tried with your code and got Getting page 1 Got page 1 Getting page 2 Getting page 3 javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure. The issue seems to be with the internet connection. – Pulco Sep 21 '21 at 15:18
  • 1
    @Pilpo - just FYI re: "You should close your connection after each end of loop". Jsoup manages the connection itself when you call get() or execute(). There's nothing to close. The connect() method just prepares a request, it doesn't actually make any network IO or have anything to leak. – Jonathan Hedley Sep 21 '21 at 23:59
  • @Pulco you might want to try rate limiting your requests, if there's a flaky network connection (maybe a proxy along the way), perhaps going slower would help. – Jonathan Hedley Sep 22 '21 at 00:00
  • @JonathanHedley i tried this https://pastebin.com/W6JEFHyW by adding Guava Ratelimiter to only have a request per second but it's still failling. Or maybe you were talking about other type of ways to limit my requests. – Pulco Sep 22 '21 at 12:57
  • 2
    So it seems the issue was with my computer in particular, i did a run on my laptop and it worked with no issue. I will look if my antivirus software is not blocking my connection. – Pulco Sep 23 '21 at 08:52
  • You can check if it's due to the number of concurrent threads by limiting them: https://www.baeldung.com/java-8-parallel-streams-custom-threadpool – sebge2 Oct 07 '21 at 12:14
  • **Which 11?** If it was specifically 11.0.1, failure on TLS1.3 connections _after_ the first completes (i.e. resumptions) [could be same as this (also jsoup!)](https://stackoverflow.com/questions/53913758/java-11-https-connection-fails-with-ssl-handshakeexception-while-using-jsoup) – dave_thompson_085 Nov 12 '21 at 02:47

1 Answers1

1

[RESOLVED]

So it seems the issue is with the JDK11 and the presence of TLSv1.3 among the protocols enabled by default in the JDK. I tried with JDK 16 and i have no issue anymore. A link to a post who goes deeper in the explanation : Java 11 and 12 SSL sockets fail on a handshake_failure error with TLSv1.3 enabled

Pulco
  • 41
  • 4