I am using Python with BeautifulSoup4 and I need to retrieve visible links on the page. Given this code:
soup = BeautifulSoup(html)
links = soup('a')
I would like to create a method is_visible that checks whether or not a link is displayed on the page.
Solution Using Selenium
Since I am working also with Selenium I know that there exist the following solution:
from selenium.webdriver import Firefox
firefox = Firefox()
firefox.get('https://google.com')
links = firefox.find_elements_by_tag_name('a')
for link in links:
if link.is_displayed():
print('{} => Visible'.format(link.text))
else:
print('{} => Hidden'.format(link.text))
firefox.quit()
Performance Issue
Unfortunately the is_displayed method and getting the text attribute perform a http request to retrieve such informations. Therefore things can get really slow when there are many links on a page or when you have to do this multiple times.
On the other hand BeautifulSoup can perform these parsing operations in zero time once you get the page source. But I can't figure out how to do this.