1

I'm creating python project which goal is to extract some data from estate portal. I work in python and I use selenium package. To find elements I use Xpath's .

Generally every works fine but when i try to extract text of span i encounter a problem.

span's html:

<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
text to scrap
</span>

I extract this span using xpath .

my_obj = i.find_element(By.XPATH, './div/div/div[2]/div[3]/div/span'

I think it is correct because it returns selenium object and when i try to get class attribute using:

print('my_obj',my_obj.get_attribute('class'))

it returns correct class some-class

My problem is that's i cannot extract text of this span. I mean text to scrap.

I think i have tried everything .

my_obj.text
my_obj.get_attribute('innetText')
my_obj.get_attribute('textContent')
my_obj.get_attribute('innerHTML')

These obove doesnt't work.

Any Idea whats's I 'm doing wrong?

user13137381
  • 115
  • 3
  • 12

1 Answers1

1

Given the HTML:

<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
    text to scrap
</span>

The text i.e. text to scrap is a within a Text Node and the lastChild of it's parent <p>. So to extract the desired text you can use either of the following locator strategies:

  • Using xpath, execute_script() and textContent:

    print(driver.execute_script('return arguments[0].lastChild.textContent;', driver.find_element(By.XPATH, "//span[@class="some-class"]")).strip())
    
  • Using xpath, get_attribute() and splitlines():

    print(driver.find_element(By.CSS_SELECTOR, "span.some-class").get_attribute("innerHTML").splitlines()[2])
    

Alternative

As an alternative you can also use Beautiful Soup as follows:

Code Block:

from bs4 import BeautifulSoup

html_text = '''
<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
    text to scrap
</span>
'''

soup = BeautifulSoup(html_text, 'html.parser')
last_text = soup.find("span", {"class": "some-class"}).contents[2]
print(last_text.strip())

Console Output:

text to scrap

Another Alternative

As another alternative you can also use lxml.etree as follows:

Code Block:

from lxml import etree

html_text = '''
<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
    text to scrap
</span>
'''
x = etree.HTML(html)
result = x.xpath('//span[@class="some-class"]/text()[2]') # get the text inside span
print(result[0].strip()) # since LXML return a list, you need to get the first one

Console Output:

text to scrap

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352