How to extract a Google link's href from search results with Selenium?

Question

Ultimately I am just trying to get the href of the first link to google's search result

The information I need also exists in an 'a' element, but it is stored in a 'data-href' attribute, which I could not figure how to extract the data from (get_attribute('data-href') returns None).

I am using Phantomjs, but have also tried with Firefox web driver

The href is displayed in a cite tag in a google search (which can be found by inspecting the small green link text under each link in google search results).

The cite element is apparently found with Selenium, but the text returned (element.text, or get_attribute('innerHTML'), or (text)) is not what is shown in the html.

For instance, there is a cite tag <cite class="_Rm">www.fcv.org.br/</cite>, but element.text shows “wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...”

I have tried to retrieve the cite element with by_css_selector, tag_name, class_name, and xpath with the same results.

links = driver.find_elements_by_css_selector('div.g') # div[class="g"]
link = links[0] # I am looking for the first link in the main links section
next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag
nextB = next.find_element_by_tag_name('cite')

div containing cite tag (there is only one in the div)

    <div class="s">
         <div>
             <div class="f kv _SWb" style="white-space:nowrap">
                  <cite class="_Rm">www.fcv.org.br/</cite>

Please let me know if something else is needed in the markup — ballade4op52, Feb 06 '16 at 13:30

score 6 · Answer 1 · edited May 23 '17 at 12:00

6

Find the first a element inside every search result and get it's href attribute value:

from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get("https://www.google.com/search?q=test")

results = driver.find_elements_by_css_selector('div.g')
link = results[0].find_element_by_tag_name("a")
href = link.get_attribute("href")

Then you can extract the actual url from the href value with urlparse:

import urlparse

print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"])

Prints:

[u'http://www.speedtest.net/']

edited May 23 '17 at 12:00

Community

1
1

answered Feb 06 '16 at 14:35

alecxe

462,703
120
1,088
1,195

I could have been more clear. I am actually looking for the href, not the title – ballade4op52 Feb 06 '16 at 14:43
When retrieving the href with get_attribute, I still get a link reflective of the improper text returned. I initially tried to extract a data-href attribute the 'a' had (www.fcv.org.br), but returns None. http://wikimapia.org/3789673/pt/Funda%C3%A7%C3%A3o-Cristiano-Varella-Hospital-do-C%C3%A2ncer-de-Muria%C3%A9 ... is the link that is returned from direct href, which is why I am trying to retrieve from the cite tag (www.fcv.org.br). The search query is "Fundação Cristiano Varella - Hospital do Câncer de Muriaé" – ballade4op52 Feb 06 '16 at 14:53
@Phillip gotcha, updated the answer, hope it helps. Thanks. – alecxe Feb 06 '16 at 15:28
It appears the issue is with my search query, as I'm presently sending keys to google's search bar...looking into it – ballade4op52 Feb 06 '16 at 16:07

ballade4op52 · Accepted Answer · 2017-02-22T06:48:00.147

The search method is the problem. Instead of retrieving the url appended with the query, its keys were being sent to the search bar with send_keys, followed by ENTER. One solution is to retrieve the url for each page ('https://www.google.com/search?q=' + query). In this case, text retrieval of cite, or href retrieval of 'a' works the same, without urlparse. Or sending a click to google's search button would appear to trump sending ENTER.

score 1 · Answer 3 · answered Feb 06 '16 at 18:14

Try this one:

public class GoogleSearchPage {
    // locators
    @FindBy(id = "lst-ib")
    private WebElement searchInputBox;
    @FindBy(name = "btnG")
    private WebElement searchButton;
    @FindBy(id = "ires")
    private WebElement searchResultContainer;
    By searchResultHeader = By.tagName("h3");

    // perform search action with the given text
    public void searchText(String text) {
        searchInputBox.sendKeys(text);
        searchButton.click();
    }

    public List<String> readSearchResults() {
        List<WebElement> searchResults = searchResultContainer
                .findElements(searchResultHeader);

        List<String> searchResultsHeaderText = new ArrayList<String>();
        int size = searchResults.size();
        for (int i = 0; i < size; i++) {
            searchResultsHeaderText.add(searchResults.get(i).getText());
        }
        return searchResultsHeaderText;
    }

}

complete source: https://github.com/jagdeepjain/ui-automation-testng

How to extract a Google link's href from search results with Selenium?

3 Answers3

Linked