why is mechanize not downloading the full page?

Question

I am using Mechanize to sign into LinkedIn and get all the employees of a certain company. However when I download the page with the search results of the employees it is missing the whole middle and I have no idea why.

Here is my code (took out my linkedin sign in info):

from mechanize import Browser
from bs4 import BeautifulSoup
br=Browser()
br.set_handle_robots(False)
br.open('https://www.linkedin.com/')
br.select_form('login')
br['session_key']=YOUR_EMAIL_HERE
br['session_password']=YOUR_PASSWORD_HERE
response=br.submit()
page=br.open('https://www.linkedin.com/vsearch/p?f_CC=10667')
html=page.read()
soup=BeautifulSoup(html)
text=soup.prettify()
text=text.encode("ascii", "ignore")
fo= open("website.html",'wb')
fo.write(text)
fo.close()

The response is this (I recommend downloading the HTML and just looking at it with a browser): http://pastebin.com/7z1dPiTd

I am not sure if I used the open function correctly, that may be the problem.

You'd better use [`linkedin api`](http://developer.linkedin.com/apis) instead. — alecxe, Apr 25 '14 at 23:01
I have looked into it, they do not provide any way to get the employees of a company — jped, Apr 25 '14 at 23:16
Is that data provided by AJAX ? If so, mechanize is not going to see it. — dilbert, Apr 25 '14 at 23:21

score 0 · Answer 1 · edited May 23 '17 at 12:28

0

Alright, After doing some research it seems that Mechanize was not waiting for the Javascript to load and therefore I was not downloading the correct info. Mechanize does not provide a method for waiting for the Javascript, so I have to use either windmill or selenium look at these: here and here

edited May 23 '17 at 12:28

Community

1
1

answered Apr 27 '14 at 02:13

jped

486
6
19

why is mechanize not downloading the full page?

1 Answers1