When we browse the site, we have the option to "View source" and "View page source". BS4 makes it possible to get data from the "View page source", is it possible to get data from the "View source"? If not, is there any other way to get them? I would really appreciate your help!
Asked
Active
Viewed 717 times
0
-
What is the difference between "view source" and "view page source"? Ie. what exactly do you want to achieve? – RJ Adriaansen Jan 09 '22 at 07:44
-
@RJ Adriaansen In the case of "view source", I see more data (for example, the ID or the position in the rating of the book in the online store), but in the case of "view page source" this information is absent. Therefore, I am looking for a way to get data exactly from the first option. While researching this problem, I found another way out: if you click "view source", then select Network on the top panel, and select XHR (or FETCH/XHR), you will see some links. If I can find a way to find my way to them using Python, that will be very cool. If you have any ideas, I will be glad to hear! – Никита Пешков Jan 09 '22 at 08:33
-
My guess is that the site loads its contents dynamically, hence the contents don't appear in the source code. You can use [Selenium](https://stackoverflow.com/questions/7861775/python-selenium-accessing-html-source) to retrieve the full page source. – RJ Adriaansen Jan 09 '22 at 08:57
-
@RJ Adriaansen Your link helped me solve the problem! Thanks for your time! – Никита Пешков Jan 09 '22 at 09:23
1 Answers
0
Solution:
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)
driver.get("my_URL")
time.sleep(10)
html_source = driver.page_source
Using the headless option we launch the browser without displaying the window. A pause is needed for the entire javascript to be executed, otherwise the data we need will not have time to load. As a result, we get data that matches the data from the "View source".

Никита Пешков
- 3
- 3