How to get renewable information on a web by python3?

Question

I want to get some information on a web page. I use requests.get to abstract the page. But I cannot find what I want. Checking it carefully, I found the info I want is in a list with a scrollbar. When I drag scrollbar down, more and more info is loaded. So I guess all the info in the list is not loaded yet when I get the page by module requests. I want to know what is happened in this process and How can I gather the information I want. (I am not familiar with Html language).

Request is good tool to load the source of web, bs4 is another good tool for parse web page! try both together! But your question is maybe duplication! please do some search before ask so simple question on SO! — Frank AK, Jan 26 '18 at 05:01

7stud · Accepted Answer · 2018-02-01T10:27:10.510

I want to know what is happened in this process

It sounds like when the user scrolls, the scrolling causes some javascript(js) to execute, and the js makes repeated requests to the server for more data. Unfortunately, the requests module cannot cause the javascript on an html page to execute--all you get back is the text of the js. The unable to execute javascript on an html page in order to retrieve what the user actually sees has been a problem for a long time. Fortunately, smart programmers have largely solved that problem. You need to use a different module. Check out the selenium module.

I am not familiar with Html language

Scraping web pages can get really tricky really fast, and some web pages proactively try to prevent computer programs from scraping their content, so you need to know both html and js in order to figure out what is going on.

How to get renewable information on a web by python3?

1 Answers1