1

My issue I'm having is that I want to grab the related links from this page: http://support.apple.com/kb/TS1538

If I Inspect Element in Chrome or Safari I can see the <div id="outer_related_articles"> and all the articles listed. If I attempt to grab it with BeautifulSoup it will grab the page and everything except the related articles.

Here's what I have so far:

import urllib2
from bs4 import BeautifulSoup
url = "http://support.apple.com/kb/TS1538"
response = urllib2.urlopen(url)
soup = BeautifulSoup(response.read())
print soup
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
Matthew
  • 837
  • 3
  • 18
  • 33
  • BeautifulSoup is only a parser. I think your problem is more likely with `urlopen`. Have you checked to see if the appropriate elements have, in fact, been included _before_ you try to parse it? – Joel Cornett Apr 07 '13 at 19:37

2 Answers2

5

This section is loaded using Javascript. Disable your browser's Javascript to see how BeautifulSoup "sees" the page.

From here you have two options:


After some digging it seems it does a request to this address (http://km.support.apple.com/kb/index?page=kmdata&requestid=2&query=iOS%3A%20Device%20not%20recognized%20in%20iTunes%20for%20Windows&locale=en_US&src=support_site.related_articles.TS1538&excludeids=TS1538&callback=KmLoader.receiveSuccess) and uses JSONP to load the results with KmLoader.receiveSuccess being the name of the receiving function. Use Firebug of Chrome dev tools to inspect the page in more detail.

Community
  • 1
  • 1
Emil Ivanov
  • 37,300
  • 12
  • 75
  • 90
3

I ran into a similar problem, the html contents that are created dynamically may not be captured by BeautifulSoup. A very basic solution for this is to make it wait for few seconds before capturing the contents, or use Selenium instead that has the functionality to wait for an element and then proceed. So for the former, this worked for me:

import time

# .... your initial bs4 code here

time.sleep(5) #5 seconds, it worked with 1 second too
html_source = browser.page_source

# .... do whatever you want to do with bs4
Ibo
  • 4,081
  • 6
  • 45
  • 65