How can I access this type of site using requests?

Question

This is the first time I've encountered a site where it wouldn't 'allow me access' to the webpage. I'm not sure why and I can't figure out how to scrape from this website.

My attempt:

import requests
from bs4 import BeautifulSoup

def html(url):
    return BeautifulSoup(requests.get(url).content, "lxml")

url = "https://www.g2a.com/"

soup = html(url)

print(soup.prettify())

Output:

<html>
 <head>
  <title>       
   Access Denied
  </title>
 </head>
 <body>
  <h1>
   Access Denied
  </h1>
  You don't have permission to access "http://www.g2a.com/" on this server.
  <p>
   Reference #18.4d24db17.1592006766.55d2bc1
  </p>
 </body>
</html>

I've looked into it for awhile now and I found that there is supposed to be some type of token [access, refresh, etc...].

Also, action="/search" but I wasn't sure what to do with just that.

Does this answer your question? [Scraper in Python gives "Access Denied"](https://stackoverflow.com/questions/41982475/scraper-in-python-gives-access-denied) — Humayun Ahmad Rajib, Jun 13 '20 at 07:52
I think that is to prevent "robots" from accessing the page and the above comment suggests another post with a fix — dstrants, Jun 13 '20 at 18:34

score 1 · Accepted Answer · answered Jun 13 '20 at 00:23

1

This page needs to specify some HTTP headers to obtain the information (Accept-Language):

import requests
from bs4 import BeautifulSoup

headers = {'Accept-Language': 'en-US,en;q=0.5'}

def html(url):
    return BeautifulSoup(requests.get(url, headers=headers).content, "lxml")

url = "https://www.g2a.com/"

soup = html(url)

print(soup.prettify())

Prints:

<!DOCTYPE html>
<html lang="en-us">
 <head>
  <link href="polyfill.g2a.com" rel="dns-prefetch"/>
  <link href="images.g2a.com" rel="dns-prefetch"/>
  <link href="id.g2a.com" rel="dns-prefetch"/>
  <link href="plus.g2a.com" rel="dns-prefetch"/>

... and so on.

answered Jun 13 '20 at 00:23

Andrej Kesely

168,389
15
48
91

Are there multiple headers that would have worked in this case? If not, how did you know which header would work? – thehammerons Jun 13 '20 at 00:39
@thehammerons In my case, specifying only `Accept-Language` worked, but if it isn't working for you you can try `User-Agent` etc.. Look at Firefox Developer Tools for more info (Chrome has something similar). – Andrej Kesely Jun 13 '20 at 00:44

How can I access this type of site using requests?

1 Answers1

Linked

Related