13

I'm sure this has been answered somewhere, because it's a very basic question - I can not, however, for the life of me, find the answer on the web. I feel like a complete idiot, but I have to ask so, here goes:

I'm writing a python code that will produce a list of all page addresses on a domain. This is done using selenium 2 - my problem occurs when I try to access the list of all links produced by selenium.

Here's what I have so far:

from selenium import webdriver
import time

HovedDomene = 'http://www.example.com'
Listlinker = []
Domenesider = []
Domenesider.append(HovedDomene)

driver = webdriver.Firefox()

for side in Domenesider:        

        driver.get(side)
        time.sleep(10)
        Listlinker = driver.find_elements_by_xpath("//a")

        for link in Listlinker: 

            if link in Domenesider:
              pass
            elif str(HovedDomene) in str(link):
              Domenesider.append(side)

print(Domenesider)
driver.close()

the Listlinker variable does not contain the links found on the page - instead the list contains, (I'm guessing here) selenium specific objects called WebElements. I can not, however, find any WebElement attributes that will give me the links - as a matter of fact I can't find any examples of WebElement attributes being accessed in python (at least not in a manner i can reproduce)

I would really appreciate any help you all could give me

Sincerely Rookie

Rookie
  • 1,590
  • 5
  • 20
  • 34
  • I had trouble finding the selenium documentation the first time around, and today, I had the same problem (had to go back in my log to find the page). I'm guessing other people might have the same problem, so I decided to post a [link](http://selenium.googlecode.com/svn/trunk/docs/api/py/index.html) here, for my sake and anyone else reading this. – Rookie Nov 23 '11 at 09:42
  • The line Listlinker = driver.find_elements_by_xpath("//a") will generate a webdriver object which is not iterable. How are you iterating it using for in your code next? – abhi Apr 06 '13 at 21:22

2 Answers2

18

I'm familiar with python's api of selenium but you probably can receive link using get_attribute(attributename) method. So it should be something like:

linkstr = ""
for link in Listlinker: 
  linkstr = link.get_attribute("href")

  if linkstr in Domenesider:
    pass
  elif str(HovedDomene) in linkstr:
    Domenesider.append(side)
Greg Sadetsky
  • 4,863
  • 1
  • 38
  • 48
VMykyt
  • 1,589
  • 12
  • 17
  • 1
    btw you also may have some issues with this: driver.get(side) time.sleep(10) driver.find_elements_by_xpath("//a") because page may be not loaded and you will receive NoSuchElementException. – VMykyt Nov 14 '11 at 13:18
  • 1
    In the words of the famous Clay Davis: 'shiiiiiiiiit' - because - yeah - I JUST found that out!. Thank you so much for your response CMykyt - That did the trick. Found it in a selenium documantation page I, unbeliveably, hadn't seen until I googled one of the list objects I was getting. – Rookie Nov 14 '11 at 13:20
  • Thank you for pointing out the issue with time.sleep VMykyt - I've seen that selenium has it's own wait for command - I will implement this! – Rookie Nov 14 '11 at 14:18
  • I've been checking up on your tip to not use time.sleep(10) as a page load wait. From reading different posts itseems to me that waiting for page loading is redundant with selenium 2. Se for example [link](http://stackoverflow.com/questions/5868439/wait-for-page-load-in-selenium) The reason being that selenium 2 has a implicit wait for load function. Just thought I'd mention it to you, since you took the time to answer my question. – Rookie Nov 14 '11 at 19:24
  • 1
    I didn't tell you not to use time.sleep(xx). I told you be aware ;). Yes selenium quite good wait until page is loaded. But also we may have some ajax controls and some js-effects and so on. Mmmm... I will continue in an answer section because I want post some code sample – VMykyt Nov 15 '11 at 08:35
  • @VMykyt The line Listlinker = driver.find_elements_by_xpath("//a") will generate a webdriver object which is not iterable. How are you iterating it using for in your code next? – abhi Apr 06 '13 at 21:34
0

I've been checking up on your tip to not use time.sleep(10) as a page load wait. From reading different posts itseems to me that waiting for page loading is redundant with selenium 2. Se for example link The reason being that selenium 2 has a implicit wait for load function. Just thought I'd mention it to you, since you took the time to answer my question.

Sometimes selenium behaves in unclear way. And sometimes selenium throws errors which don't interested for us.

By byCondition;
T result; // T is IWebElement
const int SELENIUMATTEMPTS = 5;
int timeout = 60 * 1000;
StopWatch watch = new StopWatch();

public T MatchElement<T>() where T : IWebElement
{
    try
    {
        try {
            this.result = this.find(WebDriver.Instance, this.byCondition);
        }
        catch (NoSuchElementException) { }

        while (this.watch.ElapsedMilliseconds < this.timeout && !this.ReturnCondMatched)
        {

            Thread.Sleep(100);
            try {
                this.result = this.find(WebDriver.Instance, this.byCondition);
            }
            catch (NoSuchElementException) { }
        }
    }
    catch (Exception ex)
    {
        if (this.IsKnownError(ex))
        {
            if (this.seleniumAttempts < SELENIUMATTEMPTS)
            {
                this.seleniumAttempts++;
                return MatchElement();
            }
        }
        else { log.Error(ex); }
    }
    return this.result;
    }

    public bool IsKnownError(Exception ex)
    {
    //if selenium find nothing it throw an exception. This is bad practice to my mind.
    bool res = (ex.GetType() == typeof(NoSuchElementException));

    //OpenQA.Selenium.StaleElementReferenceException: Element not found in the cache
    //issue appears when selenium interact with other plugins.
    //this is probably something connected with syncronization
    res = res || (ex.GetType() == (typeof(InvalidSelectorException) && ex.Message
        .Contains("Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE)" +
                "[nsIDOMXPathEvaluator.createNSResolver]"));

    //OpenQA.Selenium.StaleElementReferenceException: Element not found in the cache
    res = res || (ex.GetType() == typeof(StaleElementReferenceException) && 
        ex.Message.Contains("Element not found in the cache"));

    return res;
}

Sorry for C# but I'm beginner in Python. Code is simplified of course.

nettux
  • 5,270
  • 2
  • 23
  • 33
VMykyt
  • 1,589
  • 12
  • 17