Programmatically save webpage AFTER all loading scripts have run

Question

I need to save a webpage. Using mechanize, I can save the HTML of the root page. However, there are many scripts that run after the page is loaded, filling several parts of the page with data that I would like to save to file.

I'm pretty sure I've seen that a library exists for this, but I can't remember the name, nor find it in my myriad searches. I think I saw that there is a library that acts just like a web browser, allowing it to load a page, run any JS scripts that need to be run, and then return the final page, as would be displayed in a real browser.

I'm not sure if it was python 2 or 3, but either would work.

score 1 · Accepted Answer · edited May 23 '17 at 12:02

1

That library is selenium. http://www.youtube.com/watch?v=g54xYVMojos is a video I made some time back to see selenium in action. Refer my answer to How to load all entries in an infinite scroll at once to parse the HTML in python to see a sample usage of selenium

edited May 23 '17 at 12:02

Community

1
1

answered Jan 10 '14 at 06:15

praveen

3,193
2
26
30

+1 I tried a few more searches, and found exactly that, though couldn't confirm it without testing yet. I'll give you the answer, and thanks for the confirmation. Grats! – CDspace Jan 10 '14 at 06:24

score 0 · Answer 2 · answered Jan 10 '14 at 06:13

0

I think I found what I was looking for. Selenium! It also has a package within python. Will update this answer and question for searchability for the community if it is indeed what I'm looking for.

answered Jan 10 '14 at 06:13

CDspace

2,639
18
30
36

FWIW see [praveen](http://stackoverflow.com/a/21037397/1007939)'s answer. Exactly what I had found, and probably the answer I was remembering – CDspace Jan 10 '14 at 06:26

Programmatically save webpage AFTER all loading scripts have run

2 Answers2