2

I am using Colly to scrape a website and I am trying to also get the TLS certificate that the site is presenting during the TLS handshake. I looked through the documentation and the response object but did not find what I was looking for.

According to the docs, I can customize some http options by changing the default HTTP roundtripper. I tried setting custom GetCertificate and GetClientCertificate functions, assuming that these functions would be used during the TLS handshake, but the print statements are never called.

    // Instantiate default collector
    c := colly.NewCollector(
        // Visit only domains: hackerspaces.org, wiki.hackerspaces.org
        colly.AllowedDomains("pkg.go.dev"),
    )

    c.WithTransport(&http.Transport{
        TLSClientConfig: &tls.Config{
            GetCertificate: func(ch *tls.ClientHelloInfo) (*tls.Certificate, error) {
                fmt.Println("~~~GETCERT CALLED~~")
                return nil, nil
            },
            GetClientCertificate: func(cri *tls.CertificateRequestInfo) (*tls.Certificate, error) {
                fmt.Println("~~~GETCLIENTCERT CALLED~~")
                return nil, nil
            },
        },
    })

Please help me scrape TLS certificates using Colly.

1 Answers1

0

This is a snippet to get leaf certificate from raw http.Response in case you give up getting certificate using Colly.

tls := ""
if res.TLS != nil && len(res.TLS.PeerCertificates) > 0 {
    cert := res.TLS.PeerCertificates[0]
    tls = base64.StdEncoding.EncodeToString(cert.Raw)
}
Dmitry Harnitski
  • 5,838
  • 1
  • 28
  • 43