Trying to parse a webpage for latest high-ranking vulnerabilities using Python and BeautifulSoup

Question

I was trying to apply what others have suggested from here:

Beautiful Soup: Accessing <li> elements from <ul> with no id

But I can't get it to work. It seems the person from that question had a 'parent' h2 header, but the one I am trying to parse does not.

Here is the webpage I am scraping:

https://nvd.nist.gov/

(I think) I located the element I need to manipulate, it's <ul id="latestVulns"> and its following li sections.

I basically want to scrape for the section that says "Last 20 Scored Vulnerability IDs & Summaries" and based off of what the vulnerabilities are, send an email to the appropriate department of my work place.

Here is my code so far:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://nvd.nist.gov/')
soup = BeautifulSoup(source.content, 'lxml')

section = soup.find('latestVulns')
print(section)

this code returns None

I'm at a loss

find specifies element to find such as ul, not an attribute. If you wanna find element by id use ```soup.find('ul', {'id':'latestVulns'})``` — Biarys, Jun 08 '18 at 13:13

score 1 · Answer 1 · answered Jun 08 '18 at 13:13

1

The first argument of find expects the name of the element and you are passing in the id.

You can use this to find the tag correctly

section = soup.find('ul', {'id': 'latestVulns'})

answered Jun 08 '18 at 13:13

Wondercricket

7,651
2
39
58

Wow, thank you so much for this. Would you happen to know of a tutorial that I could follow on how to send emails with python? This is new territory for me. – Geronimo Jun 08 '18 at 13:16
https://stackoverflow.com/questions/10147455/how-to-send-an-email-with-gmail-as-provider-using-python – Mahendra Singh Meena Jun 08 '18 at 13:17

Trying to parse a webpage for latest high-ranking vulnerabilities using Python and BeautifulSoup

1 Answers1