Dumping a dynamic web page to file?

Question

I'm a C++ programmer and I'm new to web development. I need to figure out how I can log/dump the html of a dynamic 3rd party website to a static html file on my computer, every second? The dynamic webpage refreshes every second and updates a html table with latest price info. I would like a static snapshot of this table (or the whole html page) to be saved to disk every second. That way I can parse the file with my own program and add the updated price info to a database. How do I do this? If I cant do it this way, is there a way to eves drop (and log) on the the post/get messages and replies the dynamic webpage sends?

I think a better solutions would be to just scrape the page online instead of saving it first. — PeeHaa, Oct 02 '11 at 17:14
What do you do if the request to the dynamic page takes longer than a second? — hakre, Oct 02 '11 at 17:24
Just to clarify the 3rd party dynamic webpage already refreshes itself in the client browser every second ... so my scraping would be of the passive observer kind. It wouldn't affect the load on the site for the worse. I'll have a look at cURL library ... — petke, Oct 02 '11 at 18:10

score 1 · Answer 1 · answered Oct 02 '11 at 17:18

Look into the cURL Library. I believe Scraping the content from a website, and doing your processing/business logic, then inserting or updating your database would be the most efficient way to do it, rather than saving the files contents to disk.

Alternatively, file_get_contents() works pretty well assuming you have allow_url_fopen enabled.

score 0 · Answer 2 · edited May 23 '17 at 12:21

It would be easy to do with Selenium Webdriver. You can use Selenium to create a browser object with a method, getPageSource, that pulls the entire HTML from the page, but it doesn't seem there are any C++ bindings for Selenium. If it's convenient to use Ruby, Python, or Java as part of your application, just in order to open up a browser or headless browser and pull the data, then you should be able to set up a web service or a local file to transfer that data back into your C++ application.

Web automation from C++ addresses the challenge of no Selenium C++ bindings

Or, alternately you could write your own C++ bindings for Selenium (probably more difficult)

However -- for simply pulling the HTML, you may not need Selenium if one of Dan's answers above will work.

score -1 · Answer 3 · answered Oct 02 '11 at 17:20

-1

Hej someone else.

insed of running there page every second to record there data so you can have a updated view of there prices, why not call there web service directly (the one there ajax call makes)

Gl

answered Oct 02 '11 at 17:20

megakorre

2,213
16
23

Thanks. I could probably look into that ... although I'm a bit worried I'll get my IP banned if I do something differently than the website itself would do. The way I see it, its all fair play if I just passively log the html sent to my browser. – petke Oct 02 '11 at 18:16

Dumping a dynamic web page to file?

3 Answers3

Linked