1

good night. I'm trying to access https://www.continente.pt/ and all I get it's a blank page with a black bar at the top. I'm using already those options:

url = 'https://www.continente.pt/'
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'D:\doc\Fiverr\newMercado\chromedriver.exe')
driver.get(url)

Doesn't work, I still blocked from load the content.

Blocked Continente.pt

2 Answers2

1

Websites have different rules for spiders, mostly summarized through the domain's robots.txt file. Seeing through https://www.continente.pt/robots.txt, here is the output:

User-agent: *
Disallow: */private
Disallow: */search

This might suggest that the website owners don't want anyone scraping on them. Depending on your script, and depending on the website, they may also block access to spiders. You can also check with a different webdriver, maybe Firefox.

You can also check if your IP address is blocked. If that is the case, either try to reset your router if it has dynamic IP addressing, or find a rotating IP provider to use with your script.

  • 1
    Hi, thank you so much. Already tried with firefox, changing my ip and so. nothing that works unfortunately – gustavo matteo Oct 20 '20 at 17:56
  • 1
    How about directly going to the exact URL you have to go to? Seems like the site is redirecting to another page? https://www.continente.pt/pt-pt/public/Pages/homepage.aspx – Jahziel Rae Arceo Oct 20 '20 at 23:03
  • 1
    Could you send us the HTML response or the page source retrieved? Since all we see is the image. It might give us another insight regarding your problem. – Jahziel Rae Arceo Oct 22 '20 at 16:23
  • Here, Jahziel. Thank you in advance https://ghostbin.com/paste/RZ2Uk – gustavo matteo Oct 22 '20 at 16:38
  • By the looks of it, it is trying to load a JS script. Have you checked if Javascript is working correctly? Also check the network tab during inspect element since you're actually checking the page in maximized view. Once you get what the problem is, please update us here. – Jahziel Rae Arceo Oct 25 '20 at 22:04
1

Well, I found out the answer by uninstalling all chrome based browsers and all components. Then I installed Opera (with 86 Chrome) and downloaded ChromeDriver 86 too. After that, I got access and didn't get block YET (already tried to access the site +10 times and still connecting without problem).

I didn't add any new code, just that:

from selenium import webdriver


url = "https://www.website.com"

driver = webdriver.Chrome()


driver.get(url)