-1

I am a beginner in python. I am working on a webscraping project. In the project, i want to look up the meaning and POS of some words from cambridge dictionary and export them into excel.

And this is my code:

pip install bs4
pip install requests
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent" : "xxxxxxx"}
r=requests.get('https://dictionary.cambridge.org/dictionary/english/happy', headers=headers)
soup = BeautifulSoup(r.text,'html.parser')
POS = soup.find_all("span", class_="pos dpos")
print(POS)

result: [<span class="pos dpos" title="A word that describes a noun or pronoun''.>adjective</span>, <span class="pos dpos" title="A word that describes a noun or pronoun.''>adjective</span>]

For the result, I only want to get the word 'adjective'. But I don't know how to do that, is there anyone can help me? Many Thanks.

luk2302
  • 55,258
  • 23
  • 97
  • 137
pyt
  • 5
  • 1
  • Welcome @pyt. Please follow this for asking question : https://stackoverflow.com/help/how-to-ask – Devang Sanghani Feb 07 '22 at 10:32
  • You can parse the HTML like here: https://stackoverflow.com/questions/11804148/parsing-html-to-get-text-inside-an-element – David Apr 07 '22 at 12:44

2 Answers2

0

First off: Remove the pip install commands from your script. Installing a library is only required once. Then you can use it by importing it, as you did in line 3 and 4.

You have used the command you're looking for in your code. It is the .text. Store your span inside a variable and then call it by varname.text.

SYNEC
  • 371
  • 1
  • 5
0

Agreeing with the other answer, you should remove the 2 lines:

     pip install bs4
     pip install requests

as they are not needed. Also, your problem is that the variable POS is a list, with 2 "span" tags. What you can do, is iterate through the list, each time printing out the contents. Like this:

    for div in POS: 
        print(div.text) 

This should print "adjective" twice, once for each element, if you only want to print it for a specific div, you'll need to access it via index, but you can then call the ".text" again to get the text.

The reason that you're getting a list is because when calling find_all, by a class name you will get a list returned, as class names are not unique to HTML elements.

Hope this helps :)