50

I was teaching C to my younger brother studying engineering. I was explaining him how different data-types are actually stored in the memory. I explained him the logistics behind having signed/unsigned numbers and floating point bit in decimal numbers. While I was telling him about char type in C, I also took him through the ASCII code system and also how char is also stored as 1 byte number.

He asked me why 'A' has been given ASCII code 65 and not anything else? Similarly why 'a' is given the code 97 specifically? Why is there a gap of 6 ASCII codes between the range of capital letters and small letters? I had no idea of this. Can you help me understand this, since this has created a great curiosity to me as well. I've never found any book so far that has discussed this topic.

What is the reason behind this? Are ASCII codes logically organized?

Tamás Kovács
  • 263
  • 5
  • 7
this. __curious_geek
  • 42,787
  • 22
  • 113
  • 137
  • 4
    While it's fine to talk about floats and decimals in a non-technical manner, most of the floats out there in the wild are binary floating point, not decimal floating point, which is the source of lots of confusion for programmers. It's sort of like teaching that the sun orbits the earth - fine for kids to understand night and day, but confusing for budding rocket scientists. – Matt Curtis Jun 16 '10 at 01:14
  • 1
    Related: [Things Every Hacker Once Knew](http://www.catb.org/esr/faqs/things-every-hacker-once-knew/) (about ASCII and related technologies) – Nathan Long Jan 27 '17 at 19:25
  • The gap is to align the upper and lower alphabet the same way relative to a `%32` boundary, making this work: [What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?](https://stackoverflow.com/a/54585515) – Peter Cordes Dec 26 '20 at 02:18

7 Answers7

76

There are historical reasons, mainly to make ASCII codes easy to convert:

Digits (0x30 to 0x39) have the binary prefix 110000:

0 is 110000
1 is 110001
2 is 110010

etc. So if you wipe out the prefix (the first two '1's), you end up with the digit in binary coded decimal.

Capital letters have the binary prefix 1000000:

A is 1000001
B is 1000010
C is 1000011

etc. Same thing, if you remove the prefix (the first '1'), you end up with alphabet-indexed characters (A is 1, Z is 26, etc).

Lowercase letters have the binary prefix 1100000:

a is 1100001
b is 1100010
c is 1100011

etc. Same as above. So if you add 32 (100000) to a capital letter, you have the lowercase version.

Nathan Long
  • 122,748
  • 97
  • 336
  • 451
FWH
  • 3,205
  • 1
  • 22
  • 17
  • 4
    Buy why is 'A' 65 rather than 64? Any encoding has some degree of logic and some degree of arbitrariness. – Jim Balter Dec 11 '12 at 20:29
  • 1
    @JimBalter Because they wanted the alphabet to be 1-indexed. 1 is A, 26 is Z. – Nathan Long Sep 08 '16 at 13:52
  • @NathanLong There is no reason to want that and no evidence that it is true. And 65 is not 1. Subtracting 63 from the letter would yield 1-indexing. – Jim Balter Sep 08 '16 at 18:46
  • 4
    @JimBalter Sorry, I should have put a question mark - "Because they wanted the alphabet to be 1-indexed?" I'm speculating. As far as 65 being 1, this answer says "Capital letters have the binary prefix 1000000", which is 64. So if you remove that prefix (subtract 64), A is 01 (1), B is 10 (2), etc. – Nathan Long Sep 08 '16 at 20:32
13

This chart shows it quite well from wikipedia: Notice the two columns of control 2 of upper 2 of lower, and then gaps filled in with misc. ASCII Chart on Wikipedia

Also bear in mind that ASCII was developed based on what had passed before. For more detail on the history of ASCII, see this superb article by Tom Jennings, which also includes the meaning and usage of some of the stranger control characters.

Community
  • 1
  • 1
Mesh
  • 6,262
  • 5
  • 34
  • 53
6

Here is very detailed history and description of ASCII codes: http://en.wikipedia.org/wiki/ASCII
In short:

  • ASCII is based on teleprinter encoding standards
  • first 30 characters are "nonprintable" - used for text formatting
  • then they continue with printable characters, roughly in order they are placed on keyboard. Check your keyboard:
    • space,
    • upper case sign on number caps: !, ", #, ...,
    • numbers
    • signs usually placed at the end of keyboard row with numbers - upper case
    • capital letters, alphabetically
    • signs usually placed at the end of keyboard rows with letters - upper case
    • small letters, alphabetically
    • signs usually placed at the end of keyboard rows with letters - lower case
zendar
  • 13,384
  • 14
  • 59
  • 75
  • 1
    Some older keyboards (I know the Atari 800 was one) had the " character on the 2 key, so the correspondence between encoding and keyboard order was closer. – dan04 Jun 09 '10 at 05:05
  • @dan04 That's interesting. British keyboards have the " character on the 2 key even today. And typewriters as well IIRC. – Alexander Klauer Sep 26 '18 at 09:06
5

The distance between A and a is 32. That's quite round number, isn't it?

The gap of 6 characters between capital letters and small letters is because (32 - 26) = 6. (Note: there are 26 letters in the English alphabet).

Grzegorz Oledzki
  • 23,614
  • 16
  • 68
  • 106
  • 4
    The English alphabet has 26 characters if you are making naïve assumptions about borrowed words. – Rich Seller Jul 16 '09 at 08:40
  • 2
    Actually ï is the same letter as i but with a diacritical mark. And while English borrowed quite a few words I don't think it borrowed letters like þ (Icelandic) or IJ (Dutch). – MSalters Jul 16 '09 at 09:09
  • Take a look at this, https://en.wikipedia.org/wiki/Thorn_(letter), @MSalters. þ was the old way to write the "th" phoneme. Also check out https://en.wikipedia.org/wiki/Ye_olde. – mazunki Mar 07 '19 at 11:17
  • Not only that, but the alphabets don't span a mod-32 boundary so you can toggle a single bit to flip case, instead of having to actually add or subtract (with carry propagation) to go one way or the other. [What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?](https://stackoverflow.com/a/54585515) – Peter Cordes Dec 26 '20 at 02:21
0
  • 'A' is 0x41 in hexidecimal.
  • 'a' is 0x61 in hexidecimal.
  • '0' thru '9' is 0x30 - 0x39 in hexidecimal.

So at least it is easy to remember the numbers for A, a and 0-9. I have no idea about the symbols. See The Wikipedia article on ASCII Ordering.

too much php
  • 88,666
  • 34
  • 128
  • 138
0

If you look at the binary representations for 'a' and 'A', you'll see that they only differ by 1 bit, which is pretty useful (turning upper case to lower case or vice-versa is just a matter of flipping a bit). Why start there specifically, I have no idea.

Tal Pressman
  • 7,199
  • 2
  • 30
  • 33
0

Wikipedia:

The code itself was structured so that most control codes were together, and all graphic codes were together. The first two columns (32 positions) were reserved for control characters.[14] The "space" character had to come before graphics to make sorting algorithms easy, so it became position 0x20.[15] The committee decided it was important to support upper case 64-character alphabets, and chose to structure ASCII so it could easily be reduced to a usable 64-character set of graphic codes.[16] Lower case letters were therefore not interleaved with upper case. To keep options open for lower case letters and other graphics, the special and numeric codes were placed before the letters, and the letter 'A' was placed in position 0x41 to match the draft of the corresponding British standard.[17] The digits 0–9 were placed so they correspond to values in binary prefixed with 011, making conversion with binary-coded decimal straightforward.

beggs
  • 4,185
  • 2
  • 30
  • 30