17

I have been using python selenium for web automation testing. The key part of automation is to find the right element for a user-visible object in a HTML page. The following API will work most of the time, but not all the time.

find_element_by_xxx,  xxx can be id, name, xpath, tag_name etc. 

When HTML page is too complicated, I would like to search the dom tree. Wonder if it's possible to ask the selenium server to serialize the entire DOM (with the element id that can be used to perform action on through webdriver server). Client side (python script) can do its own search algorithm to find the right element.

Note that python selenium can get the entire html page by

drv.page_source

However, parsing this doesn't give the internal element id from selenium server's point of view, hence not useful.

EDIT1: Paraphrase it to make it more clear (thanks @alecxe): what's needed here is a serialized representation of all the DOM elements (with their DOM structure preserved) in the selenium server, this serialized representation can be sent to the client side (a python selenium test app) which can do its own search.

packetie
  • 4,839
  • 8
  • 37
  • 72
  • 2
    What do you mean by "internal element id from selenium server's point of view"? In the DOM there is no *internal* element id: the id is a public characteristic of elements. Perhaps you mean to refer to the identifier that Selenium associates with elements when you use the functions for finding elements. This is separate from the DOM and a serialization of the DOM won't give you this. Also, I've written extensive test suites with Selenium and never found a case where serializing the whole DOM was necessary. This question looks like an [XY problem](http://meta.stackexchange.com/q/66377/241526). – Louis Sep 02 '14 at 09:37
  • Thanks @Louis for the clarification on element identifier. Yes, the serialization should both have the DOM elements and the element identifiers. The point is to do the search entirely on the client side. – packetie Sep 02 '14 at 14:26
  • @Louis, the searching I need to perform is too complex for find_element_by... type of API can do. I don't have the actual code. To give you an idea, it's like finding the text field that's "close" to label "Limit". Yes, it is not precisely enough to express with a Xpath, css selector etc. – packetie Sep 02 '14 at 20:58
  • @Louis, yes, in theory it is doable by a series of `find_element_by` but it's extremely inefficient. I am definitely interested to see your `execute_script` example, please share the link if possible. – packetie Sep 03 '14 at 02:20

6 Answers6

20

The Problem

Ok, so there may be cases where you need to perform some substantial processing of a page on the client (Python) side rather than on the server (browser) side. For instance, if you have some sort of machine learning system already written in Python and it needs to analyze the whole page before performing actions on them, then although it is possible to do it with a bunch of find_element calls, this gets very expensive because each call is a round-trip between the client and the server. And rewriting it to work in the browser may be too expensive.

Why Selenium's Identifiers wont' do it

However, I do not see an efficient way to get a serialization of the DOM together with Selenium's own identifiers. Selenium creates these identifiers on an as-needed basis, when you call find_element or when DOM nodes are returned from an execute_script call (or passed to the callback that execute_async_script gives to the script). But if you call find_element to get identifiers for each element, then you are back to square one. I could imagine decorating the DOM in the browser with the required information but there is no public API to request some sort of pre-assignment of WebElement ids. As a matter of fact, these identifiers are designed to be opaque so even if a solution managed somehow to get the required information, I'd be concerned about cross-browser viability and ongoing support.

A Solution

There is however a way to get an addressing system that would work on both sides: XPath. The idea is to parse the DOM serialization into a tree on the client side and then get the XPath of the nodes you are interested in and use this to get the corresponding WebElement. So if you'd have to perform dozens of client-server roundtrips to determine which single element you need to perform a click on, you'd be able so reduce this to an initial query of the page source plus a single find_element call with the XPath you need.

Here is a super simple proof of concept. It fetches the main input field of the Google front page.

from StringIO import StringIO

from selenium import webdriver
import lxml.etree

#
# Make sure that your chromedriver is in your PATH, and use the following line...
#
driver = webdriver.Chrome()
#
# ... or, you can put the path inside the call like this:
# driver = webdriver.Chrome("/path/to/chromedriver")
#

parser = lxml.etree.HTMLParser()

driver.get("http://google.com")

# We get this element only for the sake of illustration, for the tests later.
input_from_find = driver.find_element_by_id("gbqfq")
input_from_find.send_keys("foo")

html = driver.execute_script("return document.documentElement.outerHTML")
tree = lxml.etree.parse(StringIO(html), parser)

# Find our element in the tree.
field = tree.find("//*[@id='gbqfq']")
# Get the XPath that will uniquely select it.
path = tree.getpath(field)

# Use the XPath to get the element from the browser.
input_from_xpath = driver.find_element_by_xpath(path)

print "Equal?", input_from_xpath == input_from_find
# In JavaScript we would not call ``getAttribute`` but Selenium treats
# a query on the ``value`` attribute as special, so this works.
print "Value:", input_from_xpath.get_attribute("value")

driver.quit()

Notes:

  1. The code above does not use driver.page_source because Selenium's documentation states that there is no guarantee as to the freshness of what it returns. It could be the state of the current DOM or the state of the DOM when the page was first loaded.

  2. This solution suffers from the exact same problems that find_element suffers from regarding dynamic contents. If the DOM changes while the analysis is occurring, then you are working on a stale representation of the DOM.

  3. If you have to generate JavaScript events while performing the analysis, and these events change the DOM, then you'd need fetch the DOM again. (This is similar to the previous point but a solution that uses find_element calls could conceivably avoid the problem I'm talking about in this point by ordering the sequence of calls carefully.)

  4. lxml's tree could possibly differ structurally from the DOM tree in such a way that the XPath obtained from lxml does not address the corresponding element in the DOM. What lxml processes is the cleaned up serialized view that the browser has of the HTML passed to it. Therefore, so long as the code is written to prevent the problems I've mentioned in point 2 and 3, I do not see this as a likely scenario, but it is not impossible.

Louis
  • 146,715
  • 28
  • 274
  • 320
  • Happy to help! You should also look at the other answer I've submitted, since it is the method I mentioned in my comments earlier and is a method that would take care of a great deal of cases. It may not be what you need now but it is a good think to know, in general. – Louis Sep 03 '14 at 14:24
17

Try:

find_elements_by_xpath("//*")

That should match all elements in the document.

UPDATE (to match question refinements):

Use javascript and return the DOM as a string:

execute_script("return document.documentElement.outerHTML")
Artur Barseghyan
  • 12,746
  • 4
  • 52
  • 44
David K. Hess
  • 16,632
  • 2
  • 49
  • 73
  • 1
    Thanks David for the idea. unfortunately, it will return an array and lose the hierarchy/tree structure of the elements. – packetie Aug 31 '14 at 18:35
  • Sounds like you want to do this then: http://stackoverflow.com/questions/10520294/locating-child-nodes-of-webelements-in-selenium?rq=1 – David K. Hess Sep 01 '14 at 22:21
  • David, I have used Xpath in selenium automation script. However, the Xpath is the client side (selenium script) requests the (selenium) server to do a XPath search. I would like to get the serialized DOM structure sent to the client side and let the client itself do the Xpath search. – packetie Sep 02 '14 at 01:23
2

See my other answer for the issues regarding any attempts at getting Selenium's identifiers.

Again, the problem is to reduce a bunch of find_element calls so as to avoid the round-trips associated with them.

A different method from my other answer is to use execute_script to perform the search on the browser and then return all the elements needed. For instance, this code would require three round-trips but can be reduced to just one round-trip:

el, parent, text = driver.execute_script("""
var el = document.querySelector(arguments[0]);
return [el, el.parentNode, el.textContent];
""", selector)

This returns an element, the element's parent and the element's textual contents on the basis of whatever CSS selector I wish to pass. In a case where the page has jQuery loaded, I could use jQuery to perform the search. And the logic can get as complicated as needed.

This method takes care of the vast majority of cases where reducing round-trips is desirable but it does not take care of a scenario like the one I've given in illustration in my other answer.

Community
  • 1
  • 1
Louis
  • 146,715
  • 28
  • 274
  • 320
0

You can try to utilize the page object pattern. That sounds closer to what you are looking for in this case. You might not change everything to that, but at least for this part you might want to consider that.

http://selenium-python.readthedocs.org/en/latest/test-design.html?highlight=page%20object

You can also loop through all the elements of the page and save them off one at a time, but there should be some library that can do that. I know for .Net there is htmlAgility. I'm not sure on python.

Update I found this...perhaps it will help you. Html Agility Pack for python

Community
  • 1
  • 1
mutt
  • 783
  • 1
  • 7
  • 11
  • thanks for the link, this is about design pattern. It doesn't say how to retrieve all the elements at once from selenium server. Conceptually it is there, just not sure how to retrieve it. – packetie Aug 18 '14 at 16:36
0

Two ways I know are :-

get_source = driver.page_source

Secondly using javascript :-

pageSource = driver.execute_script("return document.documentElement.outerHTML;")
Kumar Rishabh
  • 292
  • 1
  • 9
-1

Actually you can do this quite easily. Write output to a stream like var w = window.open... and then document.write...

recursively iterate through the document object returning JSON.Stringify returning each object. I suggest you throw in typeof as well.

var s = 
recurse(obj) {
    for(var i in obj) {
       return typeof(i) + ":" + i.toString() + ":" + JSON.stringify(obj[i]);
    }
}

I'd suggest adding some sort of filtering to remove properties that you don't want to see. Also I doubt would run as the browsers detect and escape out of recursive loops.

I found this question looking for something similar, but I was hoping for a DataTable object (I'm using .Net) that I could bind into some sort of debugging window, something better than chrome. Before I used firebug to do this, but that is sorta dead.

So you could also get this data but in real time using a debugger.

user1529413
  • 458
  • 8
  • 19