Use an already open webpage(with selenium) to beautifulsoup?

Question

I have a web page open and logged in using webdriver code. Using webdriver for this because the page requires login and various other actions before I am set to scrape.

The aim is to scrape data from this open page. Need to find links and open them, so there will be a lot of combination between selenium webdriver and BeautifulSoup.

I looked at the documentation for bs4 and the BeautifulSoup(open("ccc.html")) throws an error

soup = bs4.BeautifulSoup(open("https://m/search.mp?ss=Pr+Dn+Ts"))

OSError: [Errno 22] Invalid argument: 'https://m/search.mp?ss=Pr+Dn+Ts'

I assume this is because its not a .html?

see [how to get innerHTML of whole page in selenium driver](https://stackoverflow.com/questions/35905517/how-to-get-innerhtml-of-whole-page-in-selenium-driver) — robyschek, Jan 23 '17 at 17:26

score 8 · Accepted Answer · edited May 10 '17 at 22:31

8

You are trying to open a page by a web address. open() would not do that, use urlopen():

from urllib.request import urlopen  # Python 3
# from urllib2 import urlopen  # Python 2

url = "your target url here"
soup = bs4.BeautifulSoup(urlopen(url), "html.parser")

Or, use an HTTP for humans - requests library:

import requests

response = requests.get(url)
soup = bs4.BeautifulSoup(response.content, "html.parser")

Also note that it is strongly advisable to specify a parser explicitly - I've used html.parser in this case, there are other parsers available.

I want to use the exact same page(same instance)

A common way to do it is to get the driver.page_source and pass it to BeautifulSoup for further parsing:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get(url)

# wait for page to load..

source = driver.page_source
driver.quit()  # remove this line to leave the browser open

soup = BeautifulSoup(source, "html.parser")

edited May 10 '17 at 22:31

Corey Goldberg

59,062
28
129
143

answered Jan 23 '17 at 17:17

alecxe

462,703
120
1,088
1,195

2

I think I didn't explain properly, the page is already open. :( I want to use the exact same page(same instance) opened by selenium. In both the examples I assume a new url based request is being made to the open/get the data. – Sid Jan 23 '17 at 17:22
1

@Sid alright, I've updated the answer - please see if this is what you've meant. Thanks. – alecxe Jan 23 '17 at 17:25
The third one was exactly what I was looking for. :) Thanks – Sid Jan 23 '17 at 17:31

Use an already open webpage(with selenium) to beautifulsoup?

1 Answers1