Webscraping : How to parse this kind of contents in Python?

Question

I'm working on a web scraping project

When i run my code :

url = myurl

session = requests.session()
response = session.get(url)
print(response.content)

The response.content looks like this:

<html><head><meta charset="utf-8"><script>function i700(){}i700.F20=function (){return typeof i700.O20.p60==='function'?i700.O20.p60.apply(i700.O20,arguments):i700.O20.p60;};i700.X70=function (){return typeof i700.v70.p60.............................

Inspecting the source webpage using Firefox Dev Tools, I found the data I need.

score 0 · Answer 1 · answered May 20 '20 at 16:33

0

The response that you showed does not appear to be gziped; response.content will return the response as a binary byte-string, which is likely not what you want.

In order to get the response in plain-text, you will want to use response.text. From there, you should be able to search the string for the element that you want using string.find().

Source: requests documentation.

answered May 20 '20 at 16:33

Drew Brining

1

Thanks Drew, i tried your solution, but `response.text` produces the same output of `response.content` . This source seems not to be a binary byte-string, it looks like a compressed javascript functions. When i visit the site with a standard browser, "source html" looks ok. When i try to get it from requests library or Selenium, the source looks weird. – BlackMath May 21 '20 at 14:22

score 0 · Accepted Answer · answered May 23 '20 at 08:12

After some reserches, i found the solution. I noticed that my target website can detect Selenium as bot, even though there's no automation applied.

So, to access this kind of web page without getting detected, i found a solution using the ChromeOptions() class to add some arguments to:

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"]) 
options.add_experimental_option('useAutomationExtension', False)

Source: Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

Webscraping : How to parse this kind of contents in Python?

2 Answers2