Python - BeautifulSoup - Unable to extract Span Value

Question

I have an XML with mutiple Div Classes/Span Classes and I'm struggling to extract a text value.

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I want</span>

So far I have written this:

    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    spans = soup.find_all('span', attrs={'class': 'html-tag'})[29]
    print(spans.text)

This unfortunately only prints out the "This is a Heading that I dont want" value e.g.

This is the heading I dont want

Number [29] in the code is the position where the text I need will always appear.

I'm unsure how to retrieve the span value I need.

Please can you assist. Thanks

why do you search of attrs={'class': 'html-tag'} if the tag you want doesn't have this class? — Roy2012, Jun 17 '20 at 12:27

score 1 · Accepted Answer · answered Jun 17 '20 at 12:29

You can search by <div class="line"> and then select second <span>.

For example:

txt = '''
   # line 1

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I dont want</span>
   </div>

   # line 2

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I dont want</span>
   </div>

   # line 3

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I want</span>   <--- this is I want
   </div>'''


soup = BeautifulSoup(txt, 'html.parser')
s = soup.select('div.line')[2].select('span')[1]    # select 3rd line 2nd span

print(s.text)

Prints:

This is the text I want

Perfect! Thank you so much, still very new to Python :) – MB4ig Jun 17 '20 at 13:31 — MB4ig, Jun 17 '20 at 13:31

Python - BeautifulSoup - Unable to extract Span Value

1 Answers1

Linked