Whenever I use the normal code:
requests.get('https://example.com/example')
I get the whole entire text of the website dumped onto the screen. How would I only source only part of the web page into python?
Asked
Active
Viewed 30 times
0

Dylan H
- 1
-
Look at [How to read html from a url in python 3](https://stackoverflow.com/questions/24153519/how-to-read-html-from-a-url-in-python-3) – Aviv Yaniv Aug 24 '20 at 11:08
1 Answers
0
Easiest way, you can use another library called, beautifulsoup4, but if you want to use requests
you can use regex to preprocess it before using so here is two examples
import re
def preprocess(string: str) -> str:
string = re.sub("[^A-Za-z0-9]+", " ", string)
string = re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$[^A-Za-z0-9]", "", string)
return string
With Beautiful Soup
from bs4 import BeautifulSoup
def get_clean(url: str) -> str:
r = requests.get(url).text
soup = BeautifulSoup(r, "html.parser")
return soup.get_text()

Yagiz Degirmenci
- 16,595
- 7
- 65
- 85