How to extract just the number from html?

Question

I am trying to extract on the number from this html element:

<td bgcolor="green">
    <font color="white">
        "49.8 "
        <small>dBmV</small>
    </font>
</td>

How do only extract the 49.8 without getting the bBmV also?

I am able to use the xpath on to return the all of 49.8 dbmv but when searching the xpath of just "49.8" I receive error

Error:

invalid selector: The result of the xpath expression "/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()" is: [object Text]. It should be an element.

I have tried:

browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text

which returns 49.8 dBmV

And then:

browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()").text

returns the exception above.

I just want the number 49.8 (which changes obviously). i know i could extract the number later but im hoping there something I can use to just to get the details directly from the html, something a bit tidier

For this, when selenium methods fool me, I'm actually just splitting the html text for my needs using builtin funcs, but i don't know if you'd still want to do it with selenium or not — Nenri, Jun 20 '19 at 07:31
Use the `.split()` and then get the 0th element from the list — Germa Vinsmoke, Jun 20 '19 at 07:34
Any way to just extract directly from the html rather than having to split after? — Glenn Davies, Jun 20 '19 at 07:41

score 2 · Accepted Answer · answered Jun 20 '19 at 08:18

2

To extract the text 49.8 you can use the following Locator Strategy:

Using xpath through execute_script() and textContent:

print(driver.execute_script('return arguments[0].firstChild.textContent;', driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']")).strip())

Using xpath through splitlines() and get_attribute():

print(driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']").get_attribute("innerHTML").splitlines()[1])

answered Jun 20 '19 at 08:18

undetected Selenium

183,867
41
278
352

1

GOod one! did not think of `splitlines()`! – Moshe Slavin Jun 20 '19 at 08:24
1

Thats done it! print(driver.execute_script('return arguments[0].firstChild.textContent;', driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']")).strip()) is working for me. Thanks for your help! – Glenn Davies Jun 20 '19 at 08:28

score 1 · Answer 2 · answered Jun 20 '19 at 07:52

1

You can use the first line and just get the number like this:

text_num = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
print(float(text_num.split()[0]))

Hope this helped!

answered Jun 20 '19 at 07:52

Israel Pechman

11
3

1

I can do that but i would prefer to extract directly from the html, if its possible – Glenn Davies Jun 20 '19 at 08:06

score 1 · Answer 3 · edited Jun 20 '19 at 10:46

1

You can replace the extra text like this:

first_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
second_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/small").text
only_first_text = first_text.replace(second_text, '')

edited Jun 20 '19 at 10:46

undetected Selenium

183,867
41
278
352

answered Jun 20 '19 at 08:15

Moshe Slavin

5,127
5
23
38

Hmm yeah that would work but still hoping to be able to extract the number directly without another line to delete the text. Can i search element ignoring the /small part? – Glenn Davies Jun 20 '19 at 08:20

bertilnilsson · Answer 4 · 2019-06-20T08:23:17.510

0

The find_element_by_xpath API in Selenium only supports returning elements, so eventhough it's possible in XPath to specify an expression that would return just the text that you're looking for it won't be possible in this case with XPath only.

edited Jun 20 '19 at 08:23

answered Jun 20 '19 at 07:46

bertilnilsson

304
1
4

The comment makes a lot of sense, thanks, but ive tried that line and still getting an error, although now it gives unable to locate: NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()[0]"} I tried several variations as well, trying to make and work but getting no such element or invalid selector exceptions – Glenn Davies Jun 20 '19 at 08:00
@GlennDavies Sorry, I was looking at the Xpath without considering the selenium context properly. The `find_element_by_xpath` only supports returning elements, it won't work with Xpaths that return anything else. I will update my answer now. – bertilnilsson Jun 20 '19 at 08:20

How to extract just the number from html?

4 Answers4

Linked