2

I often hear people say download with HTTP. What does it really mean technically?

HTTP stands for Hyper Text Transfer Protocol. So to understand it literally, it is meant for text transferring. And I used some sniffer tool to monitor the wire traffic. What get transferred are all ASCII characters. So I guess we have to convert whatever we want to download into characters before transferring it via HTTP. Using HTTP URL encoding? or some binary-to-text encoding schema such as base64? But that requires some decoding on the client side.

I always think it is TCP that can transfer whatever data, so I am guessing HTTP download is a mis-used word. It arise because we view a web page via HTTP and find some downloadable link on that page, and then we click it to download. In fact, browser open a TCP connection to download it. Nothing about HTTP.

Anyone could shed some light?

smwikipedia
  • 61,609
  • 92
  • 309
  • 482
  • You can display any raw byte data as ascii characters, that's all up to your interpreter. There's no conversion necessary. Normal packets are typically implemented as byte arrays anyways, which is essentially how strings are implemented as well. – Red Alert Nov 20 '13 at 03:02
  • @RedAlert So HTTP can transfer any raw byte data as if they are meaningful text? – smwikipedia Nov 20 '13 at 03:03
  • @RedAlert I just checked http://stackoverflow.com/questions/3538021/why-do-we-use-base64. It seems we have to use the Base64 encoding to make sure the data arrives intact. – smwikipedia Nov 20 '13 at 03:33
  • 4
    Have you looked at how images (which are binary data btw) are transferred over HTTP? Take your network sniffer or the network tab in your browser's developer tools... – PoByBolek Nov 20 '13 at 09:15
  • 1
    @smwikipedia the body of HTTP can be any data. However the receiver needs to know how to handle the data it receives, and in some cases, such as HTML it expects pure text. base64 comes in, in situations where you need to embed binary data in a text-only format. For example embedding a PNG image directly in HTML (which avoids the necessity of a second HTTP request) – Thayne Dec 05 '13 at 03:48

7 Answers7

13

The complete answer to What does HTTP download exactly mean? is in its RCF 2616 specification, that you can read here: https://www.rfc-editor.org/rfc/rfc2616

Of course that's a long (but very detailed) document.

I won't replicate or summarize its content here.

In the body of your question you are more specific:

So to understand it literally, it is meant for text transferring.

I think the word "TEXT" it misleading you.

And

have to convert whatever we want to download into characters before transferring it via HTTP

is false. You don't necessarily have to.

A file, for example a JPEG image, may be sent over the wire without any kind of encoding. See for example this: When a web server returns a JPEG image (mime type image/jpeg), how is that encoded?

Note that optionally a compression or encoding may be applied (the most common case is GZIP for textual content like html, text, scripts...) but that depends on how the client and the server agree on how the data have to be transferred. That "agreement" is made with the "Accept-Encoding" and "Content-Encoding" directives in respectively the request's and the resonse's headers.

Community
  • 1
  • 1
Paolo
  • 15,233
  • 27
  • 70
  • 91
  • What about Audio and Video Files ? – Chandan Kumar Dec 05 '13 at 04:38
  • 2
    you have a video .mp4 or audio .mp3 on the server hard disk. The same bytes are transferred from the server to the client (along with the response header) - Unless, of course, the server is configured to apply some compression to the data (for example GZIP). When transferring jpg, mpeg, mp3... usually no further compression is applied as is computational costly and doesn't offer data size reduction. – Paolo Dec 05 '13 at 14:44
  • I hope all the answers live in the RFC. I will read it in detail. – smwikipedia Dec 09 '13 at 05:57
3

I understand the name is misleading you, but if you read Hyper Text Transfer Protocol as a Transfer Protocol with Hypertext capabilities, then it changes a bit.

When HTTP was developed there were already lots of protocols (for example, the IP protocol, which is how data are widely transmitted between servers on the internet) but there were not protocols that allowed for easy navigation between documents.

HTTP is a protocol that allows for transferring of information AND for hyper text (i.e. links) embedded within text documents. These links don't necessarily have to point to other text documents, so you can basically transmit any information using HTTP (the sender and the receiver agree on the type of document being sent using something called the mime type).

So the name still makes sense, even if you can send things other than text files.

Javier Ramirez
  • 3,446
  • 24
  • 31
2

HTTP stands for Hyper Text Transfer Protocol. So to understand it literally, it is meant for text transferring.

Yes, text transferring. Not necessarily plain text, but all text. It doesn't mean that your text has to be readable by a person, just the computer.

And I used some sniffer tool to monitor the wire traffic. What get transferred are all ASCII characters.

Your sniffer tool knows that you're a person, so it won't just present you with 0s and 1s. It converts whatever it gets to ASCII characters to make it readable to you. Alle communication over the wire is binary. The ASCII representation is just there for your sake.

So I guess we have to convert whatever we want to download into characters before transferring it via HTTP

No, not at all. Again, it's text – not necessarily plain text.

I always think it is TCP that can transfer whatever data, [...]

Here you're right. TCP does transfer all data, but in a completely different layer. To understand this, let's look at the OSI model:

OSI Model

When you send anything over the network, your data goes through all the different layers. First, the application layer. Here we have HTTP and several others. Everything you send over HTTP goes through the layers, down through presentation and all the way to the physical layer.

So when you say that TCP transfers the data, then you're right (HTTP could work over other transport protocols such as UDP, but that is rarely seen), but TCP transfers all your data whether you download a file from a webserver, copy a shared folder on your local network between computers or send an email.

kba
  • 19,333
  • 5
  • 62
  • 89
1

If you look at OSI model, HTTP is a protocol that lives in the application layer. So when you hear that someone uses "HTTP to transfer data" they are referring to application layer protocol. An alternative would be FTP or NFS, for example.

Browser indeed opens TCP connection, when HTTP is used. TCP lives in the transport layer and provides reliable connection on top of IP.

HTTP protocol provides different verbs that can be used to retrieve and send data, GET and POST are the most common ones. Look-up REST.

Eugene S.
  • 3,256
  • 1
  • 25
  • 36
0

HTTP can transfer "binary" data just fine. There is no need to convert anything.

Julian Reschke
  • 40,156
  • 8
  • 95
  • 98
0

HTTP is the protocol used to transfer your data. In your case any file you are downloading.

Severin
  • 8,508
  • 14
  • 68
  • 117
0

You can either do that(opening another type of connection) or you can send your data as raw text. What you'll send is just what you would see when opening the file in a text editor. Your browser just decides to save the file in your Downloads folder(or whereever you want it) because it sees the file type is not supportet(.rar, .zip).

Rooxo
  • 118
  • 8