-1

So I'm making a scraper with bs4 that scrapes this userscripts website. But I'm running in to a issue where I cant remove whitespaces. Everything I've done doesn't work. Can someone help me?

from bs4 import BeautifulSoup
import requests
import os

url = "https://openuserjs.org"

source = requests.get(url)

soup = BeautifulSoup(source.text,'lxml')

os.system('cls')

for Titles in soup.findAll("a", {"class": "tr-link-a"}):
    print(Titles.text.replace("Microsoft is aquiring GitHub", "").replace("TOS Changes", "").replace("Google Authentication Deprecation 2.0", "").replace("Server Maintenance", "").replace("rawgit.com Deprecation and EOL", ""))
MrPigbot
  • 37
  • 6

1 Answers1

2

To get the title without Announcements try below css selector.

for Titles in soup.select("a.tr-link-a>b"):
    print(Titles.text.strip())

Output:

TopAndDownButtonsEverywhere
Anti-Adblock Killer | Reek
YouTube Center
EasyVideoDownload
AdsBypasser
Endless Google
YouTube +
Shadow Selection
bongacamsKillAds
Google View Image
Youtube - Restore Classic
Webcomic Reader
Shiki Rating
Warez-BB +
cinemapress
Google Hit Hider by Domain (Search Filter / Block Sites)
Chaturbate Clean
google cache comeback
translate.google tooltip
Amazon Smile Redirect
oujs - JsBeautify
IMDb 'My Movies' enhancer
EX-百度云盘
Wide Github
DuckDuckGo Extended

If you want to use findall() then try this.

for Titles in soup.findAll("a", {"class": "tr-link-a"}):
    if Titles.find('b'):
       print(Titles.find('b').text.strip())

Code:

from bs4 import BeautifulSoup
import requests
import os

url = "https://openuserjs.org"

source = requests.get(url)

soup = BeautifulSoup(source.text,'lxml')

for Titles in soup.findAll("a", {"class": "tr-link-a"}):
    if Titles.find('b'):
       print(Titles.find('b').text.strip())
KunduK
  • 32,888
  • 5
  • 17
  • 41