I need a nudge to finish out this script.
I'm scraping a newsletter site for a particular substring. The intent is to parse the page for a particular section called Companies mentioned.. and get the names of each company into a List datatype
here is what I have so far, which works but only gets the first item:
from bs4 import BeautifulSoup as bs4
import requests
import re
url = 'http://news.hipsternomics.com/issues/how-much-is-your-personal-data-worth-on-the-black-market-148489'
r = requests.get(url).text
soup = bs4(r, 'html.parser')
companies = []
for elem in soup(text=re.compile(r'^(.*?Companies mentioned\b)')):
companies.append(elem)
Desired Outcome:
- I'd like to get the mentioned companies into a list as such:
[Google, Apple, Tesla, Nike, TJX, Ross, L Brands, Dominoes]
Also open to ways i can improve the regex function to catch anomalies like "Companies mentioned in this issue:" or "Companies mentioned:" as seen here. Thanks.