0

I'm trying to put together a simple web crawler using the jsoup library. However when calling "Jsoup.connect(url).get()" On some sites I'm getting the error below.

javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

There are a number of other questions about this error, but all of them suggest resolving it by manually creating a cert for the site in question. Since I'm trying to do a web crawler that will connect to many sites, that's not really a solution.

Is there a recommended way to resolve this? For a simple web crawler security is not particularly a concern, so the authenticity of the cert does not matter.

Daniel
  • 357
  • 1
  • 2
  • 12
  • Please show your code and state the URL to the server. Since Stack Overflow hides the Close reason from you: *Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve).* – jww Jun 11 '17 at 17:02

2 Answers2

0

Solution I'm using for now, Option 2 mentioned in a related question here. Accept server's self-signed ssl certificate in Java client

Daniel
  • 357
  • 1
  • 2
  • 12
-1

You should ignore TSL validation, set validateTLSCertificates(false):

Document document = Jsoup.connect(url).timeout(10000).validateTLSCertificates(false).get();
flavio.donze
  • 7,432
  • 9
  • 58
  • 91
  • Adding this is introducing a new error `javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure` – Daniel Jun 11 '17 at 15:56
  • That's strange, seems to have worked for others: https://stackoverflow.com/questions/40742380/how-to-resolve-jsoup-error-unable-to-find-valid-certification-path-to-requested/40742828#40742828 Could you post some code? Please make sure you remove all other leftover code. e.g. trying to get the certificate to work. Try only this line of code. – flavio.donze Jun 12 '17 at 04:53