3

I'm trying to get correct body of https://www.avito.ru/moskva page with status 200.

req, err := http.NewRequest("GET", "https://www.avito.ru/moskva", nil)
req.Header.Add("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0")
req.Header.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")

client := &http.Client{}
res, err := client.Do(req)
if err != nil {
    panic(err)
}
defer res.Body.Close()

fmt.Println(res.Status)
printBody(res) // prints body of page

the output:

403 Forbidden
"security stub from site (says that my ip banned)"

I can open this page in browser without any warnings.

I successfully got body with python:

import requests

session = requests.Session()
session.headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0',
    'Accept-Language': 'ru',
}
print(session.get("https://www.avito.ru/moskva").text)

curl also works well, even without adding any headers:

curl https://www.avito.ru/moskva
kostix
  • 51,517
  • 14
  • 93
  • 176
user
  • 41
  • 4
  • 1
    For testing, try and remove or change the agent in the Go version (since the agent content can sometimes be an issue: https://stackoverflow.com/a/59105470/6309) I understand the same agent works with the Python version, but again, for testing in the Go version, I would try that first. – VonC Oct 09 '20 at 06:25
  • What you're seeing has nothing to do with Go or with HTTP; it's so called "user-agent (or web request) finregprinting": the server tries to _guess_ whether the incoming request comes from a browser (or a mobile app) running on a user's device _and operated by a human,_ or it's some automated process. What you appear to do is called "web scraping", and many if not most commercial sites try hard to combat scraping attempts (for supposedly obvious reasons). There are ways to counter attempts at fingerprinting. – kostix Oct 09 '20 at 09:22
  • You might start, for instance, [here](https://www.reddit.com/r/golang/comments/hq9d1s/how_would_i_make_more_chrome_like_requests_with/), to get more information to base your further research on. – kostix Oct 09 '20 at 09:23

1 Answers1

4

It seems the problem is the TLS version used, setting the max version to 1.2 tls.VersionTLS12 seems to work :

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "crypto/tls"
)

func main() {
    tr := &http.Transport{
        TLSClientConfig: &tls.Config{
            MaxVersion: tls.VersionTLS12,
        },
    }
    client := &http.Client{Transport: tr}
    req, err := http.NewRequest("GET", "https://www.avito.ru/moskva", nil)
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println(err)
    }
    body, err := ioutil.ReadAll(resp.Body)
    bodyString := string(body)
    fmt.Print(bodyString)
}

If you switch to tls.VersionTLS13 it gives 403 status code so I'm guessing that version is negotiated by default for this host. On Chrome you can see that it's using tls1.3 :

enter image description here

But I'm not sure why it would return different results for tls1.3 and tls1.2

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • 2
    You saved my ass bro. God bless you. – user Oct 09 '20 at 17:34
  • 1
    The strange thing that it works with OpenSSL 1.1.1 and TLS 1.3 but not with Golang and TLS 1.3. I've explicitly checked that I send exactly the same HTTP request inside. So it looks like the server is not liking something in the TLS stack of Golang. – Steffen Ullrich Oct 09 '20 at 17:46
  • @SteffenUllrich interesting, my openssl doesn't have tls1.3 support, does `curl "https://www.avito.ru/moskva" --tlsv1.3` works for you ? – Bertrand Martel Oct 09 '20 at 17:49
  • 1
    @BertrandMartel: Yes, curl with TLS 1.3 (based on OpenSSL) works too. As does Python and Perl - both also based on OpenSSL. Firefox works too with TLS 1.3 (NSS library) and Chrome too (BoringSSL, which is a fork of OpenSSL). Only Golang with its own TLS stack seems to have problems. But the TLS handshake actually works, only the server does not want to serve the right page for some reason. – Steffen Ullrich Oct 09 '20 at 18:10
  • now it doesn't work on go version 1.9, but we can set CipherSuites like `tls.Config{ CipherSuites: []uint16{}}` – newpdv Aug 26 '22 at 14:06