I want to be able to scrape out a list of links. I cannot due this directly with BeautifulSoup because of the way the html is structured.
start_list = soup.find_all(href=re.compile('id='))
print(start_list)
[<a href="/movies/?id=actofvalor.htm"><b>Act of Valor</b></a>,
<a href="/movies/?id=actionjackson.htm"><b>Action Jackson</b></a>]
I am looking to pull just the href information. I am thinking some sort of filter where I can put all of the bold tags into a list then filter them out of another list which contains the information above.
start_list = soup.find_all('a', href=re.compile('id='))
start_list_soup = BeautifulSoup(str(start_list), 'html.parser')
things_to_remove = start_list_soup.find_all('b')
The idea is to be able to loop through things_to_remove and remove all occurrences of its contents from start_list