Get full source code from a website python selenium

Question

I would like to extract the full source code of a domain with all possible paths. For example page source of: mywebsite.com/index.html AND mywebsite.com/aboutus/ AND mywebsite.com/contactus ect.

Is there an easy way to do this?

By source code, you mean only the frontend and the resulting minified JavaScript right? The example you told are not subdomains. You want different possible paths of that website; is that right, too? (examples of subdomains are `www.mywebsite.com` and `api.mywebsite.com`) — Ali Tou, Nov 20 '21 at 04:47
Maybe duplicate? -> [Python Selenium accessing HTML source](https://stackoverflow.com/questions/7861775/python-selenium-accessing-html-source) — HedgeHog, Nov 20 '21 at 18:30

score 0 · Answer 1 · answered Nov 20 '21 at 18:03

Not sure if this is what you're looking for but you can get all the html (source code) that makes up a page with beautiful soup

from bs4 import BeautifulSoup
import urllib.request

URL = "https://www.mywebstie.com" 
try:
    page = urllib.request.urlopen(URL)
except:
    print("An error occured.")

soup = BeautifulSoup(page, 'html.parser')
print(soup)

The 'soup' variable becomes the entire pages html. You would just have to change the URL variable to each subdomain.

Get full source code from a website python selenium

1 Answers1