I would like to extract the full source code of a domain with all possible paths. For example page source of: mywebsite.com/index.html AND mywebsite.com/aboutus/ AND mywebsite.com/contactus ect.
Is there an easy way to do this?
I would like to extract the full source code of a domain with all possible paths. For example page source of: mywebsite.com/index.html AND mywebsite.com/aboutus/ AND mywebsite.com/contactus ect.
Is there an easy way to do this?
Not sure if this is what you're looking for but you can get all the html (source code) that makes up a page with beautiful soup
from bs4 import BeautifulSoup
import urllib.request
URL = "https://www.mywebstie.com"
try:
page = urllib.request.urlopen(URL)
except:
print("An error occured.")
soup = BeautifulSoup(page, 'html.parser')
print(soup)
The 'soup' variable becomes the entire pages html. You would just have to change the URL variable to each subdomain.