How to get a HTML tag value using re

Question

I'm pulling the HTML code contained in a website using the python requests library. Then I need to get some information from these HTML codes. But somehow I didn't get that data. How do I get this data?

HTML

<span data-testid="vuln-cvssv2-additional">
    Victim must voluntarily interact with attack mechanism
    <br/>
    Allows unauthorized disclosure of information
    <br/>
    Allows unauthorized modification
    <br/>
</span>

Python

import requests
import re

link = "https://nvd.nist.gov/vuln/detail/CVE-2017-10119"
f = requests.get(link)
deneme = str(f.text)

re_base_vector = r'\<span data-testid\s*\=\s*\"vuln-cvssv2- additional"\s*\>(.*?(\n))+.*?\n\<\\span\>'
find_base_vector = re.search(re_base_vector, deneme)

print(find_base_vector)

print(find_base_vector.group(0))

The output I want

Victim must voluntarily interact with attack mechanism. 
Allows unauthorized disclosure of information. 
Allows unauthorized modification

Why use regex? It is generally a bad idea with html. – QHarr Dec 04 '18 at 08:12 — QHarr, Dec 04 '18 at 08:12

QHarr · Accepted Answer · 2018-12-04T08:24:20.410

2

Regex is generally a bad idea with HTML. Read it in with HTML parser using BeautifulSoup then use an attribute selector:

soup.select_one("span[data-testid='vuln-cvssv2-additional']")

E.g.

import requests
from bs4 import BeautifulSoup

html='''
<span data-testid="vuln-cvssv2-additional">
    Victim must voluntarily interact with attack mechanism
    <br/>
    Allows unauthorized disclosure of information
    <br/>
    Allows unauthorized modification
    <br/>
</span>
'''
soup = BeautifulSoup(html, "lxml")
item = soup.select_one("span[data-testid='vuln-cvssv2-additional']").text
print(item)

edited Dec 04 '18 at 08:24

answered Dec 04 '18 at 08:14

QHarr

83,427
12
54
101

Yes it worked. Using BeautifulSoup for HTML is a more logical choice. – Ali.Turkkan Dec 04 '18 at 08:23

score 0 · Answer 2 · answered Dec 04 '18 at 08:19

0

BeautifulSoup will help you better parse and navigate through html. Simple and easy to parse a given html.

Refer:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

answered Dec 04 '18 at 08:19

ak_app

170
10

How to get a HTML tag value using re

HTML

Python

The output I want

2 Answers2