4

I am trying to get href attribute value from anchor tab on a page in my application using selenium Webdriver (Python) and the result returned has part stripped off.

Here is the HTML snippet -

<a class="nla-row-text" href="/shopping/brands?search=kamera&amp;nm=Canon&amp;page=0" data-reactid="790">

Here is the code I am using -

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Firefox()
driver.get("xxxx")

url_from_attr = driver.find_element(By.XPATH,"(//div[@class='nla-children mfr']/div/div/a)[1]").get_attribute("href")

url_from_attr_raw = "%r"%url_from_attr

print(" URL from attribute -->> " + url_from_attr)
print(" Raw string -->> " + url_from_attr_raw)

The output I am getting is -

/shopping/brands?search=kamera&page=0

instead of -

/shopping/brands?search=kamera&amp;nm=Canon&amp;page=0 OR
/shopping/brands?search=kamera&nm=Canon&page=0

Is this because of the entity representation in the URL as I see part between entities stripped? Any help or pointer would be great

Cœur
  • 37,241
  • 25
  • 195
  • 267
Sandeep Dharembra
  • 192
  • 1
  • 2
  • 8
  • Behind the scenes a call will be made to the webdriver likeso `resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})`. Could you try to use another browser to see if you are getting the same issue? E.g. `driver = webdriver.Chrome()` Perhaps it's an issue in the geckodriver. – Chuk Ultima Feb 22 '18 at 09:25
  • Well, chromedriver doesn't work either – Sandeep Dharembra Feb 22 '18 at 10:05
  • If nothing works then always use JS. You can refer this -> [link](https://stackoverflow.com/questions/10596417/is-there-a-way-to-get-element-by-xpath-using-javascript-in-selenium-webdriver) and for getting the href using JS, refer this [link](https://stackoverflow.com/questions/15439853/get-local-href-value-from-anchor-a-tag) – shank087 Feb 22 '18 at 11:35

1 Answers1

4

As per the given HTML there is a issue with the Locator Strategy which you have tried. You have used an index [1] along with find_element which is error-prone. Index e.g. [1] can be applied when a List is returned through find_elements. In this usecase an optimized expression would be :

url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']/div/div/a[@class='nla-row-text']").get_attribute("href")

The Locator Strategy can be more optimized as follows :

url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text']").get_attribute("href")

Update A

As per your comment as you still need to use indexing the optimized Locator Strategy can be :

url_from_attr = driver.find_elements(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text'][1]").get_attribute("href")

get_attribute(attribute_name)

As per the Python-API Source :

    def get_attribute(self, name):
    """Gets the given attribute or property of the element.

    This method will first try to return the value of a property with the
    given name. If a property with that name doesn't exist, it returns the
    value of the attribute with the same name. If there's no attribute with
    that name, ``None`` is returned.

    Values which are considered truthy, that is equals "true" or "false",
    are returned as booleans.  All other non-``None`` values are returned
    as strings.  For attributes or properties which do not exist, ``None``
    is returned.

    :Args:
        - name - Name of the attribute/property to retrieve.

    Example::

        # Check if the "active" CSS class is applied to an element.
        is_active = "active" in target_element.get_attribute("class")

    """

    attributeValue = ''
    if self._w3c:
        attributeValue = self.parent.execute_script(
        "return (%s).apply(null, arguments);" % getAttribute_js,
        self, name)
    else:
        resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
        attributeValue = resp.get('value')
        if attributeValue is not None:
        if name != 'value' and attributeValue.lower() in ('true', 'false'):
            attributeValue = attributeValue.lower()
    return attributeValue   

Update B

As you mentioned in your comment the url value being returned by the method is not present anywhere on the page which implies that you are trying to access the href attribute too early. So there can be 2 solutions as follows :

  • Traverse the DOM Tree and construct a Locator which will uniquely identify the element and induce WebDriverwait with expected_conditions as element_to_be_clickable and then extract the href attribute.

  • For debugging purpose you can add time.sleep(10) for the element to get rendered properly in the HTML DOM and then try to extract the href attribute.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • The locator might not be optimized but it works. There are other anchor tags under the same div and I will need to use indexing to access the first one. It will be either the one I am currently using or use find_elements and then accessing the first one (from the list) using the locators you have suggested, which return all. The point is, the value it returns is not present on the page which rules out the point that the loc is incorrect & is fetching some other element's href. Point taken though, will try to modify the loc,however I believe it is something to do with how get_attribute works – Sandeep Dharembra Feb 22 '18 at 10:15
  • I am sure your locator might be optimized, however as I said, the url value being returned by the method is not present anywhere on the page. So, it is getting to the correct element but may be because of coding decoding, it is stripping some part. Also when you say - url_from_attr = driver.find_elements(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text'][1]").get_attribute("href"), you actually mean url_from_attr = driver.find_elements(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text']")[1] and then traversing the list for get_attribute("href") – Sandeep Dharembra Feb 22 '18 at 10:40
  • Updated my answer, let me know the status. – undetected Selenium Feb 22 '18 at 11:02
  • Thanks @DebanjanB. Although not exactly in the same sense that the element was taking time to load but in my complete script, the browser was moving away from the page and hence a different element was being located by the element. I moved the find_element earlier in the code and it worked – Sandeep Dharembra Feb 22 '18 at 13:25