beautifulsoup-grab-visible-webpage-text without a file ending in .html

Question

I liked the answers I get on this page: BeautifulSoup Grab Visible Webpage Text

But my page doesn't end in .html, it's: https://biogmagscience.net

There must be a simple solution to this.

Cheers

DNS lookup is failing on that website. Did you mean,https://biomagscience.net/ ? — LazyCoder, Jul 28 '19 at 12:19

Andrej Kesely · Answer 1 · 2019-08-04T15:37:57.570

You have a typo in your URL, should be https://biomagscience.net/ This script will print printable text using get_text() method:

import requests
from bs4 import BeautifulSoup

url = 'https://biomagscience.net/'
soup = BeautifulSoup(requests.get(url).text, 'lxml')

for tag in soup.select('style, script, [style*="display:none"]'):
    tag.extract()

print(soup.get_text(strip=True, separator='\n'))

Prints:

Best Magnets For Healing | Biomagnetic Therapy Products
The Future of Health & Well-Being —Today!
Advanced Therapy for Vitality, Nerve Regeneration & Pain Relief of Acute/Chronic Injuries & Illness
Acute Injuries
•
Alzheimer’s
•
Arthritis
•
Back Pain
•
Chronic Illness
•
EMF
•
Joint Pain
•
Muscle Pain
Magnet Therapy Articles
•
Products
BiomagScience

...and so on.

score 0 · Answer 2 · answered Jul 28 '19 at 12:15

0

https://biogmagscience.net is the URL, not the file name. Go to your website, download the source code, it will be in html.

answered Jul 28 '19 at 12:15

miara

847
1
6
12

beautifulsoup-grab-visible-webpage-text without a file ending in .html

2 Answers2