I have the following types of HTML and I need to extract the "Student ID" from it. I could extract the student id from the HTML below, but I am not sure how can I modify my code so that I can correctly extract "Student ID" from the second type of HTML as well. Type1:
student_html='''
<div style= "position:absolute; border:textbook 1px solid">
<span style="font-family: Helvetica; font-size:8px">
Student ID
<span style="font-family: Helvetica; font-size:8px">
123456
<br/>
</span>
</div>
<div style= "position:absolute; border:textbook 1px solid">
<span style="font-family: Helvetica; font-size:8px">
Student Name
<span style="font-family: Helvetica; font-size:8px">
John Doe
<br/>
</span>
</div>
'''
I am using the following code to extract the "Student ID" from the above HTML
from bs4 import BeautifulSoup
soup=BeautifulSoup(student_html,"lxml")
span_tags=soup.find_all("span")
for span in span_tags:
if span.text.strip()=="Student ID":
student_id=span.findNext("span").text
if span.text.strip()=="Student Name":
student_name=span.findNext("span").text
This is the second type of HTML. Type2
type2HTML = '''<div style= "position:absolute; border:textbook 1px solid">
<span style="font-family: Helvetica; font-size:8px">
Student ID
<br/>
123456
<br/>
</span>
</div>
<div style= "position:absolute; border:textbook 1px solid">
<span style="font-family: Helvetica; font-size:8px">
Student Name
<br/>
John Doe
<br/>
</span>
</div>
'''
How can I modify the above code to extract the student ID from this?Similarly I need to extract other information:Student Name,Address, Grade etc