0

I want to parse some articles on the news site. But bs4 can`t see some tags

My code:

from bs4 import BeautifulSoup
import urllib.request

url="http://www.noi.md/md/news_id/86602"
page = urllib.request.urlopen(url)

soup = BeautifulSoup(page.read(), "html5lib")

heads=soup.find_all( 'h3')

for head in heads:
    print (head.string)

result:

>>> 
None
Citiţi de asemenea:
Adăugați un comentariu:
Citiţi de asemenea:
>>> 

As you can see it finds some tags but not all of them. there is one that remains hidden.

<h3>
Debutul companiei „<a href="http://viorica.md">Viorica-Cosmetic</a>” în calitate de participant al Festivalului „Lavender Fest” a fost încărcat cu emoții pozitive și oferte tentante pentru vizitatori.
</h3>

Is it me or is it bs4/html problem?

1 Answers1

0

Taken from this answer (enter link description here):

.string on a Tag type object returns a NavigableString type object. On the other hand, .text gets all the child strings and return concatenated using the given separator. Return type of .text is unicode object

Change your code to this: head.text

Community
  • 1
  • 1
Bubble Hacker
  • 6,425
  • 1
  • 17
  • 24