I am going to use beautifulsoup to find a table that defined in the “content logical definition” in the following links:
1) https://www.hl7.org/fhir/valueset-account-status.html
2) https://www.hl7.org/fhir/valueset-activity-reason.html
3) https://www.hl7.org/fhir/valueset-age-units.html
Several tables may be defined in the pages. The table I want is located under <h2> tag with text “content logical definition”
. Some of the pages may lack of any table in the “content logical definition” section, so I want the table to be null. By now I tried several solution, but each of them return wrong table for some of the pages.
The last solution that was offered by alecxe is this:
import requests
from bs4 import BeautifulSoup
urls = [
'https://www.hl7.org/fhir/valueset-activity-reason.html',
'https://www.hl7.org/fhir/valueset-age-units.html'
]
for url in urls:
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
h2 = soup.find(lambda elm: elm.name == "h2" and "Content Logical Definition" in elm.text)
table = None
for sibling in h2.find_next_siblings():
if sibling.name == "table":
table = sibling
break
if sibling.name == "h2":
break
print(table)
This solution returns null if no table is located in the section of “content logical definition” but for the second url having table in “content logical definition” it returns wrong table, a table at the end of the page.
How can I edit this code to access a table defined exactly after tag having text of “content logical definition”, and if there is no table in this section it returns null.