-1

I would like to extract the full source code of a domain with all possible paths. For example page source of: mywebsite.com/index.html AND mywebsite.com/aboutus/ AND mywebsite.com/contactus ect.

Is there an easy way to do this?

  • 1
    By source code, you mean only the frontend and the resulting minified JavaScript right? The example you told are not subdomains. You want different possible paths of that website; is that right, too? (examples of subdomains are `www.mywebsite.com` and `api.mywebsite.com`) – Ali Tou Nov 20 '21 at 04:47
  • Maybe duplicate? -> [Python Selenium accessing HTML source](https://stackoverflow.com/questions/7861775/python-selenium-accessing-html-source) – HedgeHog Nov 20 '21 at 18:30

1 Answers1

0

Not sure if this is what you're looking for but you can get all the html (source code) that makes up a page with beautiful soup

from bs4 import BeautifulSoup
import urllib.request

URL = "https://www.mywebstie.com" 
try:
    page = urllib.request.urlopen(URL)
except:
    print("An error occured.")

soup = BeautifulSoup(page, 'html.parser')
print(soup)

The 'soup' variable becomes the entire pages html. You would just have to change the URL variable to each subdomain.

Lobstergoat
  • 113
  • 1
  • 9