-5

I want to parse some tags.

and the pattern is

<div id="tags">blah-blah<a href="http://url/tag">What_I_Want</a></div>

I thought it works

re.findall(">"."</a></div>")

but it wasn't

what's wrong with that?

------------ Update I ------------- now i know re is not good with html.

raj give me a answer

>>> from bs4 import BeautifulSoup
>>> s = '<div id="tags">blah-blah<a href="http://url/tag">What_I_Want</a></div>'
>>> soup = BeautifulSoup(s)
>>> soup.select('div > a:first')[0].text
'What_I_Want'

and i have another question. how can i find

<div id blah blah </div>

in entire file?

E.Laemas Kim
  • 87
  • 1
  • 8

2 Answers2

1

Seems like you're trying to get the text of immediate child tag a of parent tag div.

>>> from bs4 import BeautifulSoup
>>> s = '<div id="tags">blah-blah<a href="http://url/tag">What_I_Want</a></div>'
>>> soup = BeautifulSoup(s)
>>> soup.select('div > a:first')[0].text
'What_I_Want'
>>> soup.select('div > a')[0].text
'What_I_Want'
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Short answer: you can't

Different short answer: Python XML parser (it even has examples)

Davide
  • 301
  • 1
  • 8