0

I have a script that needs to find elements via HTML, but when it access to the main page, this page shows up: https://gyazo.com/84d0e5b7a73c97db5b780f18d0ba3f89

My questions are these:

  • How can I bypass it?
  • How can I get cookies via cfscrape.create_scraper() or requests.session()?

my script:

import datetime
import bs4
import cfscrape


s = cfscrape.create_scraper()
url = str(input("["+str(datetime.datetime.now())+"]"+" [INPUT] > URL # "))
product = s.get(url, headers=headers, allow_redirects=True)
soup = bs4.BeautifulSoup(product.text,"html.parser")
Martin Gergov
  • 1,556
  • 4
  • 20
  • 29
  • 1
    try passing authentic `User-Agent` while requesting web page. https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python – Mohit Solanki Mar 03 '19 at 10:04
  • already doing that – lilyoungpolo Mar 03 '19 at 10:30
  • `requests` has built-in cookies support: https://requests.readthedocs.io/en/master/user/quickstart/#cookies your problem here would be the first part of the blocker page: those folks generate their content via javascript, you don't run javascript, you don't get any content. You'll probably have to go through a headless browser in order to properly scrape the page. And at least read the cfscrape doc, they mention both cookies and sessions. – Masklinn Jan 31 '20 at 07:18

0 Answers0