Ultimately I am just trying to get the href of the first link to google's search result
The information I need also exists in an 'a' element, but it is stored in a 'data-href' attribute, which I could not figure how to extract the data from (get_attribute('data-href')
returns None
).
I am using Phantomjs, but have also tried with Firefox web driver
The href is displayed in a cite
tag in a google search (which can be found by inspecting the small green link text under each link in google search results).
The cite element is apparently found with Selenium, but the text returned (element.text
, or get_attribute('innerHTML')
, or (text
)) is not what is shown in the html.
For instance, there is a cite tag <cite class="_Rm">www.fcv.org.br/</cite>
, but element.text
shows “wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...”
I have tried to retrieve the cite element with by_css_selector
, tag_name
, class_name
, and xpath with the same results.
links = driver.find_elements_by_css_selector('div.g') # div[class="g"]
link = links[0] # I am looking for the first link in the main links section
next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag
nextB = next.find_element_by_tag_name('cite')
div containing cite tag (there is only one in the div)
<div class="s">
<div>
<div class="f kv _SWb" style="white-space:nowrap">
<cite class="_Rm">www.fcv.org.br/</cite>