Using Python, I want to crawl data on a web page whose source if quite big (it is a Facebook page of some user).
Say the URL is the URL I am trying to crawl. I run the following code:
import urllib2
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
Data is supposed to contain the source of the page I am crawling, but for some reason, it doesn't contain all the characters that are available when I compare directly with the source of the page. I don't know what I am doing wrong. I know that the page I am trying to crawl has not been updated recently, so it is not due to the fact that I am missing some very recent data.
Does someone have a clue?
EDIT: the kind of information I am missing is like:
<code class="hidden_elem" id="up82eq_33"><!-- <div class="mbs profileInfoSection"><div class="uiHeader uiHeaderTopAndBottomBorder uiHeaderSection infoSectionHeader"><div class="clearfix uiHeaderTop"><div><h4 tabindex="0" class="uiHeaderTitle">Basic Information</h4></div></div></div><div class="phs"><table class="uiInfoTable mtm profileInfoTable uiInfoTableFixed"><tbody><tr><th class="label">Networks</th><td class="data"><div class="uiCollapsedList uiCollapsedListHidden" id="up82eq_32"><span class="visible">XXXX</span></div></td></tr></tbody></table></div></div> --></code>
It's basically some field I am interested in. What surprises me is that I can get some fields, but not all.