How to get the html source of a specific element with selenium?

Question

The page I'm looking at contains :

<div id='1'> <p> text 1 <h1> text 2 </h1> text 3 <p> text 4 </p> </p> </div>

I want to get all the text in the div, except for the text that is in the <h>. (I want to get "text 1","text 3" and "text 4") There may be a few <h> elements, or none at all. And there may be a few <p> elements, even one inside the other, or none.

I thought to do this by getting all the html source of the div, and using a regex to remove the <h> elements. But selenium.get_text does not return the html, just the text (all of it!).

I know I can use selenium.get_html_source and then look for the element I need with a regex, but that looks like a waste since selenium knows how to find the element.

Does anyone have a better solution? Thanks :)

score 9 · Accepted Answer · answered Nov 29 '09 at 20:48

9

The following code will give you the HTML in the div element:

sel = selenium('localhost', 4444, browser, my_url)
html = sel.get_eval("this.browserbot.getCurrentWindow().document.getElementById('1').innerHTML")

then you can use BeautifulSoup to parse it and extract what you really want.

I hope it helps

answered Nov 29 '09 at 20:48

luc

41,928
25
127
172

sorry, I'm new in this site... You meant clicking on the v, right ? – Rivka Nov 30 '09 at 08:17
No problem. Thanks. I spend some times a few weeks ago on a similar problem and I am happy to know that it fixed yours too. – luc Nov 30 '09 at 08:35
And what about getting it directly from the WebDriver, for example you have `wd = webdriver.Firefox()`, and I want to get this from the `wd` object – eLRuLL Mar 01 '13 at 13:22

score 4 · Answer 2 · edited Jun 20 '20 at 09:12

Use xpath. From selenium.py:

Without an explicit locator prefix, Selenium uses the following default strategies:

\**dom**\ , for locators starting with "document."

\**xpath**\ , for locators starting with "//"

\**identifier**\ , otherwise

In your case, you could try

selenium.get_text("//div[@id='1']/descendant::*[not(self::h1)]")

You can learn more about xpath here.

P.S. I don't know if there's good HTML documentation available for python-selenium, but I haven't found any; on the other hand, the docstrings of the selenium.py file seem to constitute comprehensive documentation. So I'd suggest looking up the source to get a better understanding of how it works.

Hector Minaya · Answer 3 · 2009-11-29T19:19:16.327

1

What about using jQuery?

Edit:

First you have to add the required .JS files, for that go to www.jQuery.com.

Then all you need to do is call a simple jQuery selector:

alert($("div#1").html());

edited Nov 29 '09 at 19:19

answered Nov 29 '09 at 18:07

Hector Minaya

1,695
3
25
45

I don't know jQuery. Can yo give me an example? Thanks! – Rivka Nov 29 '09 at 18:08

score 0 · Answer 4 · answered Mar 06 '16 at 07:46

0

The selected answer does not work in Python 3 at the time of writing. Instead use this:

from selenium import webdriver

wd = webdriver.Firefox()
wd.get(url)
return wd.execute_script('return window.document.getElementById('1').innerHTML')

answered Mar 06 '16 at 07:46

Michael SM

715
3
11
25

How to get the html source of a specific element with selenium?

4 Answers4

Linked