I'm trying to scrape a website and I'm using Selenium to help me do it, but I'm having problem. There are 150 pages I need to check and they're of the form "base_url&page=X"
. But when I call driver.get("base_url&page=x")
it strips off the &page=x
for some reason.
When I print the link, it shows up correctly as "base_url&page=X"
, but it opens base_url
when I click on it, but if I copy and paste the link then it brings me to the correct page -- "base_url&page=X"
.
Any idea what the problem is or how to go about fixing it?
for i in range(1, 5):
page_url = BASE_URL + "&page=" + str(i)
parsed_site = get_page(page_url)
def get_page(url):
DRIVER = webdriver.Chrome(chrome_options=chrome_options)
DRIVER.get(url)
time.sleep(2)
data = DRIVER.page_source
DRIVER.close()
return BeautifulSoup(data, "html.parser")
Stack Timeout in regards to followup answer:
Traceback (most recent call last):
File "/Users/x/PycharmProjects/proj/src/scraper3.py", line 335, in <module>
sys.exit(main())
File "/Users/x/PycharmProjects/proj/src/scraper3.py", line 309, in main
parsed_site = get_next_page(DRIVER, page_url)
File "/Users/x/PycharmProjects/proj/src/scraper3.py", line 267, in get_next_page
DRIVER.get(url)
File "/Users/x/PycharmProjects/proj/venv/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 324, in get
self.execute(Command.GET, {'url': url})
File "/Users/x/PycharmProjects/proj/venv/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "/Users/x/PycharmProjects/proj/venv/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
(Session info: chrome=64.0.3282.167)
(Driver info: chromedriver=2.35.528157 (4429ca2590d6988c0745c24c8858745aaaec01ef),platform=Mac OS X 10.13.3 x86_64)