BeautifulSoup4 can`t find h3 tag on the page

Question

I want to parse some articles on the news site. But bs4 can`t see some tags

My code:

from bs4 import BeautifulSoup
import urllib.request

url="http://www.noi.md/md/news_id/86602"
page = urllib.request.urlopen(url)

soup = BeautifulSoup(page.read(), "html5lib")

heads=soup.find_all( 'h3')

for head in heads:
    print (head.string)

result:

>>> 
None
Citiţi de asemenea:
Adăugați un comentariu:
Citiţi de asemenea:
>>>

As you can see it finds some tags but not all of them. there is one that remains hidden.

<h3>
Debutul companiei „<a href="http://viorica.md">Viorica-Cosmetic</a>” în calitate de participant al Festivalului „Lavender Fest” a fost încărcat cu emoții pozitive și oferte tentante pentru vizitatori.
</h3>

Is it me or is it bs4/html problem?

Try `head.text` instead of `.string`... – Bubble Hacker Jun 24 '16 at 09:10 — Bubble Hacker, Jun 24 '16 at 09:10
Thanks, Bubble Hacker! It works! – Александр Никифоров Jun 24 '16 at 09:16 — Александр Никифоров, Jun 24 '16 at 09:16
Wrote as answer for future documentation. – Bubble Hacker Jun 24 '16 at 09:24 — Bubble Hacker, Jun 24 '16 at 09:24

score 0 · Answer 1 · edited May 23 '17 at 12:14

0

Taken from this answer (enter link description here):

.string on a Tag type object returns a NavigableString type object. On the other hand, .text gets all the child strings and return concatenated using the given separator. Return type of .text is unicode object

Change your code to this: head.text

edited May 23 '17 at 12:14

Community

1
1

answered Jun 24 '16 at 09:20

Bubble Hacker

6,425
1
17
24

BeautifulSoup4 can`t find h3 tag on the page

1 Answers1