0

I am using BeautifulSoup to parse a webpage. Now I would like to read the Index value 31811.75 from the span:

<span>Underlying Index: <b style="font-size:1.2em;">BANKNIFTY 31811.75</b> </span>

Unfortunately the span lacks any other identifies such as class. I followed the solutions mentioned on a similar question, but I don't seem to get the whole text:

>>> print(soup.body(text=re.compile('Underlying')))
['Underlying Index: ']

I would like the used the keyword Underlying to extract the text present in the span. How can I do this?

mmcblk1
  • 158
  • 1
  • 3
  • 10

1 Answers1

1

Created a synthetic HTML document that has a span that we don't want to find. Extract decimal from found text using re.findall()

from bs4 import BeautifulSoup
import re
html = """
<html><body>
<span>unwanted</span>
<span>Underlying Index: <b style="font-size:1.2em;">BANKNIFTY 31811.75</b> </span>
</html></body>
"""

soup = BeautifulSoup(html)
index = re.findall("\d+\.\d+", soup.find(lambda tag:tag.name=="span" and "Underlying" in tag.text).text )
index[0] if len(index)==1 else None # re.findall() returns a list,  take first located decimal.  Could default to 0.0 instead of None

output

'31811.75'
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30