4

I want to scrape the text ("Showing 650 results") from a website.

The result of I am looking for is:

 Result : Showing 650 results

The following is the Html code:

<div class="jobs-search-results__count-sort pt3">
            <div class="jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4">
                Showing 650 results
            </div>

Python code:

    response = requests.get(index_url)
    soup = BeautifulSoup(response.text, 'html.parser')
    text = {}
    link = "jobs-search-results__count-string results-count-string Sans-15px-black-55% pb0 pl5 pr4" 
    for div in soup.find_all('div',attrs={"class" : link}):
        text[div.text]
    text

So far it looks like my code is not working.

David
  • 487
  • 2
  • 6
  • 18

2 Answers2

7
  1. You don't need soup.find_all if you're looking for one element only, soup.find works just as well

  2. You can use tag.string/tag.contents/tag.text to access inner text


div = soup.find('div', {"class" : link})
text = div.string
cs95
  • 379,657
  • 97
  • 704
  • 746
  • or even `tag.text`! ps.: even though it's an old way of calling `.string`, it'll always return the same thing I guess. ([*Actually it depends*](https://stackoverflow.com/questions/25327693/difference-between-string-and-text-beautifulsoup)) =) – Vinícius Figueiredo Aug 02 '17 at 00:15
  • 1
    @ViníciusAguiar Thanks :] – cs95 Aug 02 '17 at 00:16
  • 1
    I am getting the following error: 'NoneType' object has no attribute 'text' – David Aug 02 '17 at 00:32
  • @David That means the div with the class does not exist. – cs95 Aug 02 '17 at 00:33
  • @cᴏʟᴅsᴘᴇᴇᴅ that is weird. According to the html is there – David Aug 02 '17 at 00:47
  • @David Just tried this and I get `Showing 650 results`. – cs95 Aug 02 '17 at 00:50
  • it appears to work just fine with the html code, but when loaded from the actual website I believe there is a problem with the "soup = BeautifulSoup()" If you have time do you mind trying it from the actual url https://www.linkedin.com/jobs/search/?keywords=aircraft%20powerplant%20repairer&location=United%20States&locationId=us%3A0 – David Aug 02 '17 at 01:21
0

Old: from BeautifulSoup import BeautifulSoup

"Development on the 3.x series of Beautiful Soup ended in 2011, and the series will be discontinued on January 1, 2021, one year after the Python 2 sunsetting date."

New: from bs4 import BeautifulSoup

"Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree."

Kevin O.
  • 355
  • 3
  • 11