Web scraping a text element after 2 consecutive spans

Question

How can I pull the text that follows after a nested span? Specifically, I am trying to web scrape the text element that reads 2007-05-02

<td style="width: 33%"><span class="label">Start Date<span class="info-tip startdatetip">*</span>:</span> 2007-05-02</td>

My code gives me an AttributeError: 'NoneType' object has no attribute 'next_sibling'

    from bs4 import BeautifulSoup
    import urllib.request
    import csv

    source = urllib.request.urlopen('https://www.clinicaltrialsregister.eu/ctr-search/search? 
    query=&page=1').read()
    soup = BeautifulSoup(source, 'lxml')
    Start_date=soup.find('span',{'class':'label'}, text = 'Start Date').next_sibling
    print(Start_date)

Alternatively, I tried the code below which gives me none reference

Start_date=soup.find('span',{'class':'info-tip startdatetip'}).next_sibling.next_sibling
print(Start_date)

hopefully this solves your problem ^_^ https://stackoverflow.com/questions/2612548/extracting-an-attribute-value-withbeautifulsoup — Ishan Roychowdhury, Aug 11 '20 at 00:24

score 0 · Accepted Answer · answered Aug 11 '20 at 00:27

You can use stripped_strings on the td

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.clinicaltrialsregister.eu/ctr-search/search?query=&page=1', verify=False)
soup = BeautifulSoup(source.text, 'lxml')
table = soup.find("div",class_="results grid_8plus")
first_table = table.find_all("table", class_="result")[0]
start_date = list(first_table.find("tr").find_all("td")[-1].stripped_strings)[-1]
print(start_date)

Output:

2007-05-02

OR

by using next_sibling

start_date = first_table.find("tr").find_all("td")[-1].find("span").next_sibling.strip()
print(start_date)

Web scraping a text element after 2 consecutive spans

1 Answers1