C++ Windows Sockets: Downloading an html file

Question

Given:
Suppose that I have a website called "exampledomain.com", and that on that website, I have one file called "my_doc.html", the full URL address of which is "https://www.exampledomain.com/my_directory/my_doc.html". (Not my actual website; this is just hypothetical).

Objecive:
I'm trying to develop a Client-Side Application, using C++ & Windows Sockets, that downloads my HTML file, parses it, extracts some specific information, runs some calculations, and displays its results to the user.

Question:
How do I download the HTML file from the server to the directory "C:/ExampleDirectory/" on the client-side computer, using the Windows Sockets Library?*

Clarification:
I want to write this Client-side program to work with the existing website. IE: I want it to download the file in the same way that an Internet-Browser like Microsoft Edge would.

Edit: Just to clarify, the server uses a secure, account-based system, and thus the document would be transferred using HTTPS. I'm not really sure if this would effect the solution, but I thought it'd be worth mentioning.

HTTP protocol has quite a lot of details to implement so we use specia;ized http client libraries, like WinHTTP, check http://stackoverflow.com/questions/822714/how-to-download-a-file-with-winhttp-in-c-c — c-smile, Feb 16 '16 at 01:29

score 2 · Accepted Answer · edited May 23 '17 at 12:07

2

Don't.

A socket library is not an appropriate tool to talk with a web-server. http is complex enough that you want to use a specialized http library. There are several such libraries available. curllib springs to mind. And of course there is the WinHttp tag https://stackoverflow.com/tags/winhttp/info.

And for the html part, you'd want to use an html parsing library to extract the desired info.

edited May 23 '17 at 12:07

Community

1
1

answered Feb 16 '16 at 01:29

Captain Giraffe

14,407
6
39
67

On the contrary. HTTP is one of the easiest protocols to implement, on the client side. This shouldn't take more than a couple dozen of lines to implement. getaddrinfo(), socket(), connect(), write(), read(). You're done. The only thing one needs to know is the format of HTTP messages: "GET HTTP/1.0\r\nHost: \r\n\r\n". The End. This is not rocket science. P.S. Eh, maybe another six lines of code to handle proxies. – Sam Varshavchik Feb 16 '16 at 01:32
1

@Sam So what do you do for a 300? Character encoding? Compression? – Captain Giraffe Feb 16 '16 at 01:33
That might have been true back in the heydays of 0.9. Not so much nowadays. – Captain Giraffe Feb 16 '16 at 01:34
If you don't send the appropriate heading indicating that you accept compressed content, you won't get it. Ditto for encoding. If you don't specify the moon in Accept-Encoding:, you don't have to worry about encoding. And, as far as handling redirects, you got me: another dozen, or so, lines of code to suffer through. – Sam Varshavchik Feb 16 '16 at 01:37
@SamVarshavchik Did you know about continuation lines? – user253751 Feb 16 '16 at 01:50
Yes, only in HTTP/1.1 Don't have to worry about them, if you send an HTTP/1.0 request. HTTP is a well-designed protocol. – Sam Varshavchik Feb 16 '16 at 02:27
@SamVarshavchik Your statement about this being doable using Windows Sockets is very interesting to me. I would be very appreciative if you could provide a more detailed answer. – Feb 16 '16 at 03:03
1

There's plenty of documentation in Google about using the sockets API, as well as the documentation for HTTP/1.1 (although, as I've noted, you should stick to simple HTTP/1.0 features, because that's all that's needed here). The most valuable skill a software developer can acquire is learn how to track down technical documentation, reading it, and understanding it. – Sam Varshavchik Feb 16 '16 at 03:11

C++ Windows Sockets: Downloading an html file

1 Answers1