5

Ultimately I am just trying to get the href of the first link to google's search result

The information I need also exists in an 'a' element, but it is stored in a 'data-href' attribute, which I could not figure how to extract the data from (get_attribute('data-href') returns None).

I am using Phantomjs, but have also tried with Firefox web driver


The href is displayed in a cite tag in a google search (which can be found by inspecting the small green link text under each link in google search results).

The cite element is apparently found with Selenium, but the text returned (element.text, or get_attribute('innerHTML'), or (text)) is not what is shown in the html.

For instance, there is a cite tag <cite class="_Rm">www.fcv.org.br/</cite>, but element.text shows “wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...”

I have tried to retrieve the cite element with by_css_selector, tag_name, class_name, and xpath with the same results.

links = driver.find_elements_by_css_selector('div.g') # div[class="g"]
link = links[0] # I am looking for the first link in the main links section
next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag
nextB = next.find_element_by_tag_name('cite') 

div containing cite tag (there is only one in the div)

    <div class="s">
         <div>
             <div class="f kv _SWb" style="white-space:nowrap">
                  <cite class="_Rm">www.fcv.org.br/</cite>
ballade4op52
  • 2,142
  • 5
  • 27
  • 42

3 Answers3

6

Find the first a element inside every search result and get it's href attribute value:

from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get("https://www.google.com/search?q=test")

results = driver.find_elements_by_css_selector('div.g')
link = results[0].find_element_by_tag_name("a")
href = link.get_attribute("href")

Then you can extract the actual url from the href value with urlparse:

import urlparse

print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"])

Prints:

[u'http://www.speedtest.net/']
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I could have been more clear. I am actually looking for the href, not the title – ballade4op52 Feb 06 '16 at 14:43
  • When retrieving the href with get_attribute, I still get a link reflective of the improper text returned. I initially tried to extract a data-href attribute the 'a' had (www.fcv.org.br), but returns None. http://wikimapia.org/3789673/pt/Funda%C3%A7%C3%A3o-Cristiano-Varella-Hospital-do-C%C3%A2ncer-de-Muria%C3%A9 ... is the link that is returned from direct href, which is why I am trying to retrieve from the cite tag (www.fcv.org.br). The search query is "Fundação Cristiano Varella - Hospital do Câncer de Muriaé" – ballade4op52 Feb 06 '16 at 14:53
  • @Phillip gotcha, updated the answer, hope it helps. Thanks. – alecxe Feb 06 '16 at 15:28
  • It appears the issue is with my search query, as I'm presently sending keys to google's search bar...looking into it – ballade4op52 Feb 06 '16 at 16:07
1

The search method is the problem. Instead of retrieving the url appended with the query, its keys were being sent to the search bar with send_keys, followed by ENTER. One solution is to retrieve the url for each page ('https://www.google.com/search?q=' + query). In this case, text retrieval of cite, or href retrieval of 'a' works the same, without urlparse. Or sending a click to google's search button would appear to trump sending ENTER.

ballade4op52
  • 2,142
  • 5
  • 27
  • 42
1

Try this one:

public class GoogleSearchPage {
    // locators
    @FindBy(id = "lst-ib")
    private WebElement searchInputBox;
    @FindBy(name = "btnG")
    private WebElement searchButton;
    @FindBy(id = "ires")
    private WebElement searchResultContainer;
    By searchResultHeader = By.tagName("h3");

    // perform search action with the given text
    public void searchText(String text) {
        searchInputBox.sendKeys(text);
        searchButton.click();
    }

    public List<String> readSearchResults() {
        List<WebElement> searchResults = searchResultContainer
                .findElements(searchResultHeader);

        List<String> searchResultsHeaderText = new ArrayList<String>();
        int size = searchResults.size();
        for (int i = 0; i < size; i++) {
            searchResultsHeaderText.add(searchResults.get(i).getText());
        }
        return searchResultsHeaderText;
    }

}

complete source: https://github.com/jagdeepjain/ui-automation-testng

Jagdeep
  • 139
  • 3
  • 12