0

I want to extract the translation of a word in online dictionary. For example, the html code for 'car':

<ol class="sense_list level_1">
     <li class="sense_list_item level_1" value="1"><span class="def">any vehicle on wheels</span></li>

How can I extract "any vehicle on wheels" in Python with beautifulsoup or any other modules?

CCovey
  • 799
  • 1
  • 10
  • 17
Sara Santana
  • 1,001
  • 1
  • 11
  • 22

3 Answers3

1

There are multiple ways to reach the desired element.

Probably the simplest would be to find it by class:

soup.find('span', class_='def').text

or, with a CSS selector:

soup.select('span.def')[0].text

or, additionally checking the parents:

soup.select('ol.level_1 > li.level_1 > span.def')[0].text

or:

soup.select('ol.level_1 > li[value=1] > span.def')[0].text
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
1

I solve it by beautifulsoup:

soup = bs4.BeautifulSoup(html)
q1=soup.find('li', class_="sense_list_item level_1",value='1').text
Sara Santana
  • 1,001
  • 1
  • 11
  • 22
0

Assuming that is the only HTML code given, you can use NLTK.

import nltk 

#load html chunk into variable htmlstring#
extract = nltk.clean_html(htmlstring)
print(extract)
FTA
  • 335
  • 1
  • 7