1

Given:
Suppose that I have a website called "exampledomain.com", and that on that website, I have one file called "my_doc.html", the full URL address of which is "https://www.exampledomain.com/my_directory/my_doc.html". (Not my actual website; this is just hypothetical).

Objecive:
I'm trying to develop a Client-Side Application, using C++ & Windows Sockets, that downloads my HTML file, parses it, extracts some specific information, runs some calculations, and displays its results to the user.

Question:
How do I download the HTML file from the server to the directory "C:/ExampleDirectory/" on the client-side computer, using the Windows Sockets Library?*

Clarification:
I want to write this Client-side program to work with the existing website. IE: I want it to download the file in the same way that an Internet-Browser like Microsoft Edge would.

Edit: Just to clarify, the server uses a secure, account-based system, and thus the document would be transferred using HTTPS. I'm not really sure if this would effect the solution, but I thought it'd be worth mentioning.

  • 1
    HTTP protocol has quite a lot of details to implement so we use specia;ized http client libraries, like WinHTTP, check http://stackoverflow.com/questions/822714/how-to-download-a-file-with-winhttp-in-c-c – c-smile Feb 16 '16 at 01:29

1 Answers1

2

Don't.

A socket library is not an appropriate tool to talk with a web-server. http is complex enough that you want to use a specialized http library. There are several such libraries available. curllib springs to mind. And of course there is the WinHttp tag https://stackoverflow.com/tags/winhttp/info.

And for the html part, you'd want to use an html parsing library to extract the desired info.

Community
  • 1
  • 1
Captain Giraffe
  • 14,407
  • 6
  • 39
  • 67
  • On the contrary. HTTP is one of the easiest protocols to implement, on the client side. This shouldn't take more than a couple dozen of lines to implement. getaddrinfo(), socket(), connect(), write(), read(). You're done. The only thing one needs to know is the format of HTTP messages: "GET HTTP/1.0\r\nHost: \r\n\r\n". The End. This is not rocket science. P.S. Eh, maybe another six lines of code to handle proxies. – Sam Varshavchik Feb 16 '16 at 01:32
  • 1
    @Sam So what do you do for a 300? Character encoding? Compression? – Captain Giraffe Feb 16 '16 at 01:33
  • That might have been true back in the heydays of 0.9. Not so much nowadays. – Captain Giraffe Feb 16 '16 at 01:34
  • If you don't send the appropriate heading indicating that you accept compressed content, you won't get it. Ditto for encoding. If you don't specify the moon in Accept-Encoding:, you don't have to worry about encoding. And, as far as handling redirects, you got me: another dozen, or so, lines of code to suffer through. – Sam Varshavchik Feb 16 '16 at 01:37
  • @SamVarshavchik Did you know about continuation lines? – user253751 Feb 16 '16 at 01:50
  • Yes, only in HTTP/1.1 Don't have to worry about them, if you send an HTTP/1.0 request. HTTP is a well-designed protocol. – Sam Varshavchik Feb 16 '16 at 02:27
  • @SamVarshavchik Your statement about this being doable using Windows Sockets is very interesting to me. I would be very appreciative if you could provide a more detailed answer. –  Feb 16 '16 at 03:03
  • 1
    There's plenty of documentation in Google about using the sockets API, as well as the documentation for HTTP/1.1 (although, as I've noted, you should stick to simple HTTP/1.0 features, because that's all that's needed here). The most valuable skill a software developer can acquire is learn how to track down technical documentation, reading it, and understanding it. – Sam Varshavchik Feb 16 '16 at 03:11