I'm trying to scrape data from a message board and have hit a wall I can't seem to get around. I've managed to use Selenium to click through to a page that I want to pull data from but I need to pass it into Beautiful Soup first (or, at least, I think I do). What I can't figure out is how to tell BS that the page I've landed on is the one to transform without explicitly calling a get on the url.
I tried to get around this by defining the url but it still comes back as a None type.
Currently using Python 2.7 on Mac
Here's my current code/:
subj = driver.find_element_by_class_name('subject-link').click()
cu = driver.current_url
sub2 = driver.get(cu)
print(sub2)
My expectation would be that the url would print but instead it prints "None" and my assumption is that if I can get sub2 to be the url I'll then be able to get lists of strings for each of these categories
dads_starts = []
dads_participating = []
dads_messages = []
soupm = BeautifulSoup(subj.content, "lxml")
###Appends dads_starts
for d in soupm.find(class_='disabled-link'):
dads_starts.append(d.text)
###Appends dads_participating
for d in soupm.findAll(class_='disabled-link'):
dads_participating.append(d.text)
###Appends dads_messages
for d in soupm.findAll(class_='message-text'):
dads_messages.append(d.text)