0

Whenever I use the normal code: requests.get('https://example.com/example') I get the whole entire text of the website dumped onto the screen. How would I only source only part of the web page into python?

  • Look at [How to read html from a url in python 3](https://stackoverflow.com/questions/24153519/how-to-read-html-from-a-url-in-python-3) – Aviv Yaniv Aug 24 '20 at 11:08

1 Answers1

0

Easiest way, you can use another library called, beautifulsoup4, but if you want to use requests you can use regex to preprocess it before using so here is two examples

import re

def preprocess(string: str) -> str:
    string = re.sub("[^A-Za-z0-9]+", " ", string)
    string = re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$[^A-Za-z0-9]", "", string)
    return string

With Beautiful Soup

from bs4 import BeautifulSoup

def get_clean(url: str) -> str:
    r = requests.get(url).text
    soup = BeautifulSoup(r, "html.parser")
    return soup.get_text()
Yagiz Degirmenci
  • 16,595
  • 7
  • 65
  • 85