0

How can I extract the values '1.00 TK = 779.8' from the HTML code below?

I tried below code but it din't work;

from bs4 import BeautifulSoup
page = requests.get(<url>).text

##here is the html page content'''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())

ERROR:

 AttributeError: 'NoneType' object has no attribute 'find_next'
itgeek
  • 549
  • 1
  • 15
  • 33

2 Answers2

0

Use find_next(), which returns the first match:

from bs4 import BeautifulSoup

html = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())

Output:

1.00 TK = 779.8

Edit: Use Selenium:

from bs4 import BeautifulSoup
from selenium import webdriver
from time import sleep

URL = "https://www.westernunion.com/us/en/web/send-money/start?SrcCode=12345&ReceiveCountry=IN&SendAmount=100&ISOCurrency=CNY&FundsOut=BA&FundsIn=CreditCard"

driver = webdriver.Chrome(r"C:\path\to\chromedriver.exe")
driver.get(URL)
sleep(10)

soup = BeautifulSoup(driver.page_source, "html.parser")

price = driver.find_element_by_css_selector("span.ng-binding.ng-scope").text
print(price)

driver.quit()

Output:

1.00 USD = 73.9375 Indian Rupee (INR)
MendelG
  • 14,885
  • 4
  • 25
  • 52
  • getting an error AttributeError: 'NoneType' object has no attribute 'find_next' – itgeek Nov 11 '20 at 16:13
  • @itgeek The page is probably loaded dynamically. See [my answer](https://stackoverflow.com/a/64143412/12349734) using `selenium` to scrape a dynamic page. – MendelG Nov 11 '20 at 16:18
  • i tried this as well, print(soup.find('span',id="driveValue")) and it prints "None" – itgeek Nov 11 '20 at 16:23
  • your solutions helps when i want to extract a value from a string "html = ''' 1.00 TK = 779.8Disk Drive Value(DDV) ''' " – itgeek Nov 11 '20 at 16:41
  • Here's i don't have a string. I posted html snippet from page content. I'm getting the below error when I execute your solution for my requirement; AttributeError: 'NoneType' object has no attribute 'find_next' – itgeek Nov 11 '20 at 16:42
  • @itgeek please share the URL – MendelG Nov 11 '20 at 19:14
  • URL: https://www.westernunion.com/us/en/web/send-money/start?SrcCode=12345&ReceiveCountry=IN&SendAmount=100&ISOCurrency=CNY&FundsOut=BA&FundsIn=CreditCard – itgeek Nov 11 '20 at 20:32
  • and id="smoExchangeRate". and the value that needs to be extracted is "73.9375" – itgeek Nov 11 '20 at 20:32
  • I'm new to web scrapping, just started learning. :-) – itgeek Nov 11 '20 at 20:35
  • 1
    thanks I kind of knew possibilities using selenium. I was wondering if I can do without selenium. – itgeek Nov 12 '20 at 02:17
-2

Hope its help.

from lxml import etree
txt = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

root = etree.fromstring(txt)
for td in root.xpath('//span[contains(@class, "ng-binding ng-scope")]'):
    print(td.text)

print output

1.00 TK = 779.8
Samsul Islam
  • 2,581
  • 2
  • 17
  • 23
  • Thanks Samsul .. Is there any way to extract using "ID" ? – itgeek Nov 11 '20 at 16:09
  • Why use XML parsing when the OP has the BeautifulSoup tag? – MendelG Nov 11 '20 at 16:09
  • Isn't it possible using beautifulSoap ? – itgeek Nov 11 '20 at 16:09
  • yes, it is possible to extract using id. I use XML parsing because easy-to-use library for processing XML and HTML in the Python. – Samsul Islam Nov 11 '20 at 16:16
  • It works fine when i just use string but here i need to read the content of the page, and extract the value . – itgeek Nov 11 '20 at 17:50
  • page = requests.get() soup = BeautifulSoup(page.content, 'html.parser') root = etree.fromstring(soup) print(root) for td in root.xpath('//span[contains(@class, "ng-binding ng-scope")]'): print(td.text) – itgeek Nov 11 '20 at 17:56
  • error: root = etree.fromstring(soup) File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring File "src/lxml/parser.pxi", line 1895, in lxml.etree._parseMemoryDocument ValueError: can only parse strings – itgeek Nov 11 '20 at 17:57
  • It may help https://stackoverflow.com/questions/36449369/python-xpath-lxml-etree-xpathevalerror-invalid-predicate – Samsul Islam Nov 12 '20 at 18:32