1

I am using Selenium 2.0, Firefox 11.0, and Java to process a table. I have a table element composed of td cells, some which contain text included in a span element, others which contain input elements which have text in their value attributes. My goal is to get the text of every cell so I can output the table contents and compare them against expected values. I thought I would just do something like this:

Locate the table WebElement by id
List<WebElement> cells = tableElem.findElements(By.xpath(".//td"));

Then I would loop through all the cells and run findElements with the xpath ".//input" and if the list was empty I would run getText on the webElement, and if the list wasn't empty I would run getAttribute on the input element.

But to my surprise, this took several minutes to run on firefox (I'm afraid to try it on IE, which is where its supposed to be tested). When I debug it is obvious that the bottleneck is the .//input search from the td which is killing me. It is upwards of ten seconds, and so even with just a few cells my tests are taking forever. I've tried all sorts of minor variations to the xpath, tried going to css selectors, and continue to get the same results.

I want some advice about how to either tackle this problem differently or how to optimize my current method. I was hoping this would only take a couple of seconds.

I've included some sample code that should illustrate the slowdown I'm experiencing. This is not the website I'm screen scraping, but the slowness is the same:

    webDriver.navigate().to("https://accounts.google.com/NewAccount");
    List<WebElement> TDxpath = webDriver.findElements(By.xpath("//td"));
    List<WebElement> TDcss = webDriver.findElements(By.cssSelector("td"));
    for (WebElement td : TDcss) {
        List<WebElement> q = td.findElements(By.cssSelector("input"));
    }
    for (WebElement td : TDxpath) {
        List<WebElement> r = td.findElements(By.xpath(".//input"));
    }
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
newmanne
  • 2,019
  • 5
  • 22
  • 31
  • Which version of Selenium are you using? It takes exactly 2.8 seconds to execute your code on my machine. That's together with opening the page. The code is OK, something must be wrong with your environment. – JacekM May 22 '12 at 22:17

1 Answers1

0

Do you really need a browser? You could try HtmlUnitDriver, that will be blazingly fast!

Or you could do it as a JS, that also only takes a fraction of time and you can get Lists from the script:

(JavascriptExecutor)driver.executeScript(
    "var tds = document.getElementsByTagName('td');"
    "for (var i = 0; i < tds.length; i++) {" +
    "   var inputs = tds[i].getElementsByTagName('input');" +
    "}"
    );
Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • Anyway, I think that the slowest thing on this is the communication between the program and a browser. Are you sure you can't come up with less that 40 queries to get all the data? What would be your use case in the Google example, what would you expect on output? – Petr Janeček May 22 '12 at 22:54
  • I have a table where I don't know in advance what cells will contain input boxes with text and which cells will just contain plaintext with no input boxes. I want to be able to write what the table should look like at certain stages, and want to just be able to call some compareTable function that will do all the string matching for me. I need to be able to preserve the ordering of the cells to do this. The google example is admittedly not a real use case, but it seemed like an easy way to demonstrate how slow these queries can be. HtmlUnitDriver and JS are possibilities I guess... – newmanne May 23 '12 at 04:17
  • One thing to notice on the google example is that there's a lot of redundant work done since there're actually 9 tables on the page, many of them nested. Once you reduce the search to only one of them, it's much faster, too. Make sure you're not doing any unneeded work in the real case. And yeah, I couldn't come up with any easier query that would get all the `td` elements and all the `inputs` too. It's usually one or the other. Or maybe you could search for concrete text cells by taking advantage of `//td[span[text()='some text'] or input[@value='some text']]` and `following` axis in XPath. – Petr Janeček May 23 '12 at 07:31
  • I thought about using HtmlUnitDriver, but I don't think that will solve the problem. Grabbing the table by id isn't my bottleneck, and its the only thing I'm using a driver for. the rest of the searches are relative from a specific WebElement, so I think my issue is more with the WebElement class's findElements than anything else. – newmanne May 23 '12 at 15:19
  • It finishes in `HtmlUnitDriver` in about 200 milliseconds on my computer :). – Petr Janeček May 23 '12 at 15:29
  • I gave HtmlUnitDriver a shot, and as you said it was extremely fast. I didn't realize it was a different WebElement implementation. Unfortunately, I don't see how I can use it. I have lots of code in selenium that does navigation, logins, and pushes buttons etc. to manipulate its way to this table page. I thought I could just getSource() with my slow webDriver, and feed it to this and process, but then it crashes because it can't get the css, and validating the css is important. Frustrating, because this really is exactly what I wanted, but I need it to combine nicely with existing code. – newmanne May 23 '12 at 17:37
  • Agh. It should behave just like a Firefox 3.6 in memory. And I say "should", because yes, it can't do everything. See if anything from [this](http://code.google.com/p/selenium/wiki/HtmlUnitDriver) helps. If not ... try the JS. =/ – Petr Janeček May 23 '12 at 18:28
  • I went with JS: private static String getInput = "var elem = arguments[0];" + // "var inputs = elem.getElementsByTagName('input');" + // "return inputs;"; List inputBoxes = (List) ((JavascriptExecutor) webDriver).executeScript(SeleniumEx.getInput, elem); Thank you so much for all your help! – newmanne May 23 '12 at 19:45