How to click every link and extract content inside - Python Selenium

Question

I wanna get content inside from all links with id = "LinkNoticia" Actually my code join in first link and extract content, but i cant access to other.

How can i do it?

this is my code (its works for 1 link)

from selenium import webdriver

driver= webdriver.Chrome("/selenium/webdriver/chromedriver")
driver.get('http://www.emol.com/noticias/economia/todas.aspx')

driver.find_element_by_id("LinkNoticia").click()

title = driver.find_element_by_id("cuDetalle_cuTitular_tituloNoticia")
print(title.text)

Your posted code doesn't attempt to click anything but the first link. Where is that code? — JeffC, Jul 20 '18 at 20:19
@JeffC: not entirely his fault, that page is f... umm... uses non-standard HTML. — timbre timbre, Jul 20 '18 at 20:32

timbre timbre · Accepted Answer · 2018-07-20T22:07:10.730

First of all, the fact that page has multiple elements with the same ID is a bug on its own. The whole point of ID is to be unique for each element on the page. According to HTML specs:

id = name This attribute assigns a name to an element. This name must be unique in a document.

A lengthy discussion is here.

Since ID is supposed to be unique, most (all?) implementations of Selenium will only have function to look for one element with given ID (e.g. find_element_by_id). I have never seen a function for finding multiple elements by ID. So you cannot use ID as your locator directly, you need to use one of the existing functions that allows location of multiple elements, and use ID as just some attribute which allows you to select a group of elements. Your choices are:

find_elements_by_xpath
find_elements_by_css_selector

For example, you could change your search like this:

links = driver.find_elements_by_xpath("//a[@id='LinkNoticia']");

That would give you the whole set of links, and you'd need to loop through them to retrieve the actual link (href). Note that if you just click on each link, you navigate away from the page and references in links will no longer be valid. So instead you can do this:

Build list of hrefs from the links:

hrefs=[]
for link in links:
    hrefs.append(link.get_attribute("href"))

Navigate to eachhref to check its title:

for href in hrefs:
    driver.get(href);
    title = driver.find_element_by_id("cuDetalle_cuTitular_tituloNoticia")
    # etc

i already post my new code following your tips but still doesnt work. — Raul Escalona, Jul 20 '18 at 21:03
i already post new code... but i get this error: Traceback (most recent call last): File "emol1.py", line 14, in title = driver.find_element_by_id("cuDetalle_cuTitular_tituloNoticia") — Raul Escalona, Jul 20 '18 at 21:20
i delete 'http'+ from your code and works... Thanks you very much — Raul Escalona, Jul 20 '18 at 21:23

How to click every link and extract content inside - Python Selenium

1 Answers1