1

Here is fragment of the site code

<td class='vcard' id='results100212571'>   
 <h2 class="custom_seeMore">
  <a class="fn openPreview" href="link.html">Hotel Name<span class="seeMore">See More...</span></a>
 </h2> 
 <div class='clearer'></div> 
 <div class='adr'>
  <span class='postal-code'>00000</span> 
  <span class='locality'>City</span> 
  <span class='street-address'>Address</span>
 </div>
 <p class="tel">Phone number</p>

and I try to parse it

for element in BeautifulSoup(page).findAll('td'):
    if element.find('a', {'class' : 'fn openPreview'}):
        print element.find('a', {'class' : 'fn openPreview'}).string
    if element.find('span', {'class' : 'postal-code'}):
        print element.find('span', {'class' : 'postal-code'}).string
    if element.find('span', {'class' : 'locality'}):
        print element.find('span', {'class' : 'locality'}).string
    if element.find('span', {'class' : 'street-address'}):
        print element.find('span', {'class' : 'street-address'}).string
    if element.find('p', {'class' : 'tel'}):
        print element.find('p', {'class' : 'tel'}).string

I know it's very amateur code, but it almost works. ie it works for all classes except 'fn openPreview', all other classes draw their content, but

print element.find('a', {'class' : 'fn openPreview'}).string 

print None

Please help me, how to parse it.

Kubas
  • 976
  • 1
  • 15
  • 34
  • 2
    Maybe because fn and openPreview are separate classes. An element can have multiple space separated classes. – SiggyF Mar 23 '11 at 22:43
  • Oddly enough, it looks like BeautifulSoup treats `fn openPreview` as a single class. See this question: http://stackoverflow.com/questions/1242755/beautiful-soup-cannot-find-a-css-class-if-the-object-has-other-classes-too – Josh Rosen Mar 23 '11 at 22:48

1 Answers1

8

According to the BeautifulSoup documentation, element.string will be None if element has multiple children.

In your case,

print element.find('a', {'class' : 'fn openPreview'}).contents[0].string

will print "Hotel Name".

Josh Rosen
  • 13,511
  • 6
  • 58
  • 70