I have a problem with the following code
import re
from lxml import html
from bs4 import BeautifulSoup as BS
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
import requests
import sys
import datetime
print ('start!')
print(datetime.datetime.now())
list_file = 'list2.csv'
#This should be the regular input list
url_list=["http://www.genecards.org/cgi-bin/carddisp.pl?gene=ENO3&keywords=ENO3"]
#This is an example input instead
binary = FirefoxBinary('C:/Program Files (x86)/Mozilla Firefox/firefox.exe')
#Read somewhere it could be a variable useful to supply but anyway, the program fails randomly at time with [WinError 6] Invalid Descriptor while having nothing different from when it is able to at least get the webpage; even when not able to perform further operation.
for page in url_list:
print(page)
browser = webdriver.Firefox(firefox_binary=binary)
#I tried this too to solve the [WinError 6] but it is not working
browser.get(page)
print ("TEST BEGINS")
soup=BS(browser.page_source,"lxml")
soup=soup.find("summaries")
# This fails here. It finds nothing, while there is a section id termed summaries. soup.find_all("p") works but i don't want all the p's outside of summaries
print(soup) #It prints "None" indeed.
print ("TEST ENDS")
I am positive source code includes "summaries". First there is
<li> <a href="#summaries" ng-click="scrollTo('summaries')">Summaries</a></li>
then there is
<section id="summaries" data-ga-label="Summaries" data-section="Summaries">
As suggested here (Webscraping in python: BS, selenium, and None error) by @alexce, I tried
summary = soup.find('section', attrs={'id':'summaries'})
(Edit: the suggestion was _summaries but I did tested summaries too)
but it does not work either. So my questions are: why does BS not find the summaries, and why does selenium keep breaking when I use the script too much in a row (restarting a console works, on the other hand, but this is tedious), or with a list comprising more than four instances? Thanks