0

I wanted to scrape something as my first program, just to learn the basics really but I'm having trouble showing more than one result.

The premise is going to a forum (http://blackhatworld.com), scrape all thread titles and compare with a string. If it contains the word "free" it will print, otherwise it won't.

Here's the current code:

import requests
from bs4 import BeautifulSoup


page = requests.get('https://www.blackhatworld.com/')
content = BeautifulSoup(page.content, 'html.parser')
threadtitles = content.find_all('a', class_='PreviewTooltip')


n=0
for x in range(len(threadtitles)):
    test = list(threadtitles)[n]
    test2 = list(test)[0]
    if test2.find('free') == -1:
        n=n+1
    else:
        print(test2)
        n=n+1

This is the result of running the program: https://i.gyazo.com/6cf1e135b16b04f0807963ce21b2b9be.png

As you can see it's checking for the word "free" and it works but it only shows first result while there are several more in the page.

petezurich
  • 9,280
  • 9
  • 43
  • 57
Steel Hard
  • 37
  • 1
  • 1
  • 8

2 Answers2

1

By default, strings comparison is case sensitive (FREE != free). To solve your problem, first you need to put test2 in lowercase:

test2 = list(test)[0].lower()
João Eduardo
  • 452
  • 5
  • 10
1

To solve your problem and simplify your code try this:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.blackhatworld.com/')
content = BeautifulSoup(page.content, 'html.parser')
threadtitles = content.find_all('a', class_='PreviewTooltip')

count = 0

for title in threadtitles:
    if "free" in title.get_text().lower():
        print(title.get_text())
    else:
        count += 1

print(count)

Bonus: Print value of href:

for title in threadtitles:
    print(title["href"])

See also this.

petezurich
  • 9,280
  • 9
  • 43
  • 57
  • 1
    the count should be in the else and print above for what I intended but yes that really is a lot simpler lol, didn't know about the get_text function, it really does make things a lot simpler. If only you know how long I spent thinking about this whole array stuff >. – Steel Hard Nov 09 '18 at 21:10
  • I know it's not relevant to this thread but do you know if there's a way to pull the value of the 'href' tag inside the 'a' tag? – Steel Hard Nov 09 '18 at 21:16