1

Because HTTP and HTTPs are already 8 bit clean there is no need to use an 8 bit clean encoding system ( such as Base64 ). We can encode using 8 bits.

Are there any inherit limitations? I.E. what governs what can be represented by 8 bits or 256 permutations?

I noticed that Unicode, UTF - 8 bytes ( 1 byte representation ) can only represent 128 permutations, b.c. the MSB must be 0 to signal that a 1 byte representation must be used. So this is not a possibility.

What are the limitations in creating a system that uses all 8 bits specifically for the use of transmitting data in an 8 bit clean system?

The only requirement is that the data must be visibly represented using 256 symbols.

employee-0
  • 1,023
  • 1
  • 9
  • 19

1 Answers1

3

HTTP (or any protocol/system) being 8-bit clean does not mean that you can simply use any 8-bit value wherever you want within the protocol. It means only that the protocol or system is capable of handling 8-bit encoding given the right circumstances.

For example, HTTP uses carriage return+line feed (Hex values 0D0A) to delimit header fields and the body of the message, so you can't use those values together anywhere in the headers. Further, the headers and body may have limitations on their character encoding based on what type of data is contained in them. If the HTTP Content-Type is set to text/html; charset=utf-8, characters in the body like < (Hex value 3C) are reserved for HTML tags. The HTTP body may be 8-bit clean, but that doesn't mean you can put any 8-bit content you want in it, you still have to conform to UTF-8 (or some other encoding) and abide by the content rules that HTML imposes.

The purpose of Base64 is to encode arbitrary binary data for use inside other encoding schemes where characters other than [A-Za-z0-9+/] are reserved for special uses, or are totally invalid (such as inside HTML, or in a URL query string). You cannot just replace Base64 with a full 8-bit encoding scheme because an 8-bit scheme is not valid in situations where Base64 is necessary. This is true even if the protocol you're using is, itself, 8-bit clean.

In short, whatever binary encoding scheme you use is dependent on much more than just 8-bit clean vs not 8-bit clean. It depends on the protocol you're using the encoding inside of, what the protocols control characters are, and in what situations those characters are reserved.

Update:

If all you're really looking to do is return raw binary in an HTTP response, just set the HTTP Content-Type to application/octet-stream. This will allow you to return arbitrary binary in the HTTP body without any need for encoding.

Syon
  • 7,205
  • 5
  • 36
  • 40
  • I'm not sure that these special values that HTTP uses are reserved in any way. For example 0D0A is also this symbol in Uicode http://www.i2symbol.com/cool-letters/malayalam/x0D0A-malayalam-letter-uu. Can you provide a reference of special characters that HTTP uses? You would not suggest this Unicode character can not be used over HTTP? Same then for the method I am proposing. Thanks. – employee-0 Sep 26 '13 at 12:46
  • Unicode can be use in the HTTP body if you set the Content-Type header's charset to unicode, but unicode values _are not_ valid in HTTP headers. See the [HTTP RFC](http://asg.web.cmu.edu/rfc/rfc1945.html) section 2.2 concerning CRLF. Also see [this question](http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding) about how characters outside of [ISO 8859-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1) need to be encoded if you want to include them in HTTP headers. – Syon Sep 26 '13 at 13:18
  • This question was not aimed at HTTP headers. Looking into this further, I will be using Ajax GET requests which do not use "Content-type". My guess is because it is a pure bit stream with out any limitations. Any special characters are escaped prior to sending and after receiving. – employee-0 Sep 26 '13 at 13:21
  • What this all comes down to is; yes you can use a special encoding you've invented yourself in HTTP, but you can't just use whatever you want. Your special encoding must conform to whatever content type and charset is specified in the `Content-Type` header. If the charset were UTF-8, the entire body must conform to UTF-8 or a subset of it (such as ASCII). This is why Base64 works inside an HTML document (it converts arbitrary binary to a set of non-reserved ASCII characters), and a full 8-bit encoding scheme won't (cause not all 8-bit values are valid inside of HTML). – Syon Sep 26 '13 at 13:33
  • If you're just returning binary data via a GET, the Content-Type would be `application/octet-stream`, in which case no special encoding is necessary because this content type allows raw binary in the HTTP body. – Syon Sep 26 '13 at 13:36
  • yes...that was what I was thinking....application/octet-stream...is there any way I can verify the content type in javascript? – employee-0 Sep 26 '13 at 13:39
  • See the getResponseHeader(String) method of [XMLHttpRequest](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest). – Syon Sep 26 '13 at 13:48