-2

One quickly learns commands, say in C, of the form

printf("%d", x);

or

printf("%lu", x);

But no analog of %lu or %d exists for binary representations of x.

My question is firstly, why is this so, and secondly, at what point - at which level of abstraction - does the binary representation morph into decimal or hexadecimal?

Similar points on the Stack network only have seemed to elicit language-specific answers or implementation/library suggestions. My question is however concerning my overall understanding of how the data is abstracted and whether or not the OS ever sees binary, or if somehow something even lower-level than the OS covers it.

To further accentuate the direction I'm headed, consider a tangential question: would programming a source file in hex provide any benefit to performance (speed or storage) than programming in decimal (during constant or variable initialization, for instance)?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Krpcannon
  • 111
  • 8
  • 2
    There is no binary format specifier because there is little demand for displaying numeric values to humans in binary representation. – Mark Benningfield Feb 24 '18 at 19:24
  • 2
    The `0`s and `1`s in the machine have meaning *in a context*. For example the 8-bit binary value `10000000` has the decimal value `128` if it represents an unsigned integer value, and `-128` if it represents a signed integer value. – Weather Vane Feb 24 '18 at 19:28
  • ... at the machine level, one way the context will be used is the processor flags the programmer (or compiler) tests after an arithmetic operation. Several flags are set as a result, but different ones are tested according to the context. – Weather Vane Feb 24 '18 at 19:31
  • 1
    You are confusing character representation with scalar numeric value. – Mark Benningfield Feb 24 '18 at 19:51
  • 2
    For the question of why there isn't a `%b` in `printf` for binary, see [this question](https://stackoverflow.com/questions/111928) and [this question](https://stackoverflow.com/questions/48008920). – Steve Summit Feb 24 '18 at 20:02

3 Answers3

5

at which level of abstraction - does the binary representation morph into decimal or hexadecimal?

at no point at all. The CPU only sees sequences of 0s and 1s. When they are grouped together, this 0s and 1s can have a meaning, like for example a sequence of 32 0s and 1s represent a 32-bit integer value.

We humans are bad at looking at 32 characters and calculating the value in our heads, that's why we use decimal, octal, hexadecimal representations, because it's easier to handle. The scalar value 18 is a value that doesn't change, but it's representation may change depending on the number of digits you have. 18 in binary is 0001 0010, in octal 22, in hexadecimal 12, in decimal 18.

The %d and %x, %o conversion specifiers for printf allows us to print a scalar value as decimal, hexadecimal and otcal respectivly. The %u is for printing unsigned values.

edit

Please address at which point those 0's and 1's in the CPU are recognized as anything else...

perhaps what you've got to understand first, is that decimal, hexadecimal, ocatal, binary is only a representation of a scalar value. We humans use this representations to grasp the idea of a quantity. We choose a base number of digits which represent a fix value. In decimal we have 10 digits, 0, 1, 2 ... 9. Each digit has a fixed value and when we combine this digits together, we can express values larger than 9. For example the value represented by the sequence 123 is equal to:

3x100 + 2x101 + 1x102

that's why we call the digit on the right the units column, the digit in the middle the tens column and the digit on the left the hunderds column.

at which point are the 0's and 1's rewritten into ASCII characters or numbers more meaningful to us?

They don't have a meaning at all for the CPU, they are just values, patterns of 0s and 1s. It's us humans (or rather the body who created the ASCII table) who give them a meaning by saying when the char variable has the value 48, we consider it as '0', i.e. the character representation of the value 0. The CPU sees only sequences of 0s and 1s, we humans determine their meaning and our algorithm are what determine what we do with this sequences of 0s and 1s.

You cannot mix values with their representations. Representations are only meaningful to us humans.

Pablo
  • 13,271
  • 4
  • 39
  • 59
0

A binary int or long is not the same thing as a string of ASCII '0' and '1' digits. An int is 32 bits / 4 bytes (on a typical C implementation), but a string with one character per bit is 32 bytes. The fact that ISO C doesn't define a conversion to print as base 2 text is basically unrelated to how computers store integers internally.

at which level of abstraction - does the binary representation morph into decimal or hexadecimal?

It doesn't morph, printf has to calculate the digit values of the hex, decimal, or whatever radix representation of the number. And also convert those digit values to ASCII characters, and store them in a buffer (or send them to the OS one at a time).

The usual algorithm algorithm is repeated modulo/division by the radix. From my answer to How do I print an integer in Assembly Level Programming without printf from the c library?:

char *itoa_end(unsigned long val, char *p_end) {
  const unsigned base = 10;
  char *p = p_end;
  do {
    *--p = (val % base) + '0';   // for hex, also need to handle the a-f range...
    val /= base;
  } while(val);                  // runs at least once to print '0' for val=0.

  // write(1, p,  p_end-p);
  return p;  // let the caller know where the leading digit is
}

There's no "magic" in calculating the string representation of a number, just math using normal code (which compiles to normal CPU instructions). It's no different from any other function that takes a number and stores some bytes in a char[] array.

libc printf implementations will use code like that to store characters into a buffer. glibc for example has an internal function exactly like this, storing backwards from the end of a buffer, called from printf and some other functions. Modulo produces the least-significant digit of the base-n representation, but that digit is last in printing order.

Real implementations with a variable base would special-case base 10, base 8, and base 16, because division by a compile-time constant is much faster than the arbitrary case. And division / modulo by a known power of 2 can compile to just shift / AND. But that's just an implementation detail. Although for power-of-2 bases, you can get the digits in printing order because they only depend on a range of bits in the binary integer, not all the other bits.

whether or not the OS ever sees binary, or if somehow something even lower-level than the OS covers it.

Actually printing the characters is separate from converting to a string representation, and (for printf) happens via the same mechanism that fwrite(3) would use. After being buffered by stdio, eventually a write() system call will ask the OS to copy some bytes to a file descriptor / handle.

Most OSes (including Windows and POSIX-like OSes such as Linux or OS X) only have system calls that read / write bytes from/to file descriptors / handles. The OS never sees the 4-byte binary integer, the C library does all the conversion in user-space.

Some CPU simulators like MARS or SPIM have "system calls" which read a string the user typed into a binary integer in a register, or the reverse. But normal OSes leave this up to user-space libraries.


would programming a source file in hex provide any benefit to performance (speed or storage) than programming in decimal (during constant or variable initialization, for instance)?

No, conversion to binary integer happens at compile time, so if the source is static int foo = 0xa, bar = 10; the object file only contains two 4-byte binary integers, each with the same bit-pattern representing the same value.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Your first sentence appears to be a sort of restatement of my question, and is in fact the source of my confusion. – Krpcannon Feb 24 '18 at 19:04
0

At no time are the numbers in the computer in decimal format.

The question is not when do they become binary but when do they become decimal.

You should be able to take the binary number 0b1111011 and convert that to 123 decimal and 0x7B hexadecimal, not by using the base conversion buttons on your calculator but by understanding how conversion from bases works like 3785 seconds is 1 hour 3 minutes and 5 seconds (based 60 from base 10).

The C library sees you want decimal, it takes the bits 0b1111011 that up to that moment had no meaning they were just bits, and after this they will go back to having no meaning at least to the computer they only have meaning to you. To get the 100's place you have to divide by 0b1100100 the result of that is 0b1 right so subtract 0b1100100 from 0b1111011 and you get 0b10111, now divide that by 0b1010 you get 0b10 so subtract 0b10100 (0b1010 times 0b10) from 0b10111 and get 0b11 so the conversion to base 10 so far is 0b1, 0b10, 0b11. Now printf needs to make ASCII out of that so it adds 0b110000 to those numbers giving 0b110001, 0b110010, 0b110011. And feeds that "string" into a character output routine (and you see 123). At no time do we have anything decimal, its just bits being manipulated.

When you write some code

unsigned int x = 5;

The compiler converts that 5 (which is really 0b110101 in the source code file) into 0b101 and places that wherever it decides to store the variable x.

Now lets go back to 0b1111011 and convert that to hex, starting from the right take four bits at a time you get 0b111 and 0b1011, SIGNIFICANTLY faster than base 10 conversion thus far (in general, just a bit faster for an 8 bit number if that is what this was). One of two ways you can add 0b110000 then do a comparison with 0b111001 or you can do the comparison with 0b1001 then add a different number. So for example 0b111 becomes 0b110111 and then check if it is greater than 0b111001, nope, so move on to 0b1011 add 0b110000 and you get 0b111011 is that greater than 0b111001? yes it is so either add 0b111 or 0b100111 depending on whether you wanted to see capital letters or lower case, now the string is for example 0b110111,0b1000010 plus a terminating zero, you send that off to print and you see 7B on the output.

Hex output is going to be faster yes. How much and how relevant is that gain, it depends on a number of factors...

Now I have no idea what you mean by programming a source file in hex

unsigned int x = 0x5;

is going to take slightly longer to compile than

unsigned int x = 5;

Because of the extra characters. But

unsigned int x = 0x7B;

vs

unsigned int x = 123;

hmm, decimal probably still faster.

unsigned int x = 0x11111111;

vs

unsigned int x = 286331153;

Now you have to wonder, there is a point where the hex will be faster on a particular machine, data patterns matter too as shown here.

As shown here the hex version takes two more bytes of storage to hold the source file.

unsigned int x = 0x5;

unsigned int x = 5;

The compiled output is identical with respect to the constant being applied to x (0b101). So the machine code (and/or .data storage) is identical not only in size, but identical.

unsigned int fun0 ( void )
{
    return(5);
}
unsigned int fun1 ( void )
{
    return(0x5);
}
unsigned int fun2 ( void )
{
    return(123);
}
unsigned int fun3 ( void )
{
    return(0x7B);
}

gives this machine code

00000000 <fun0>:
   0:   e3a00005
   4:   e12fff1e

00000008 <fun1>:
   8:   e3a00005
   c:   e12fff1e

00000010 <fun2>:
  10:   e3a0007b
  14:   e12fff1e

00000018 <fun3>:
  18:   e3a0007b
  1c:   e12fff1e

There have been and are C libraries with %b, but it is non standard, never made sense why it isnt. Likewise octal, hmm, there is one for octal.

Note octal conversion is competitive with hex, you dont have the conditional

0b1111011, mask and shift off 3 bits at a time 0b001, 0b111, 0b011 add 0x110000 as you do each giving 0b110001, 0b110111, 0b110011. So you dont have the conditional but you have more "characters" to deal with, for 8 bit numbers hex could win but for larger octal should win.

while on that topic:

unsigned int fun0 ( void )
{
    return(5);
}
unsigned int fun1 ( void )
{
    return(0x5);
}
unsigned int fun2 ( void )
{
    return(05);
}
unsigned int fun3 ( void )
{
    return(123);
}
unsigned int fun4 ( void )
{
    return(0x7B);
}
unsigned int fun5 ( void )
{
    return(0173);
}

gives

00000000 <fun0>:
   0:   e3a00005
   4:   e12fff1e

00000008 <fun1>:
   8:   e3a00005
   c:   e12fff1e

00000010 <fun2>:
  10:   e3a00005
  14:   e12fff1e

00000018 <fun3>:
  18:   e3a0007b
  1c:   e12fff1e

00000020 <fun4>:
  20:   e3a0007b
  24:   e12fff1e

00000028 <fun5>:
  28:   e3a0007b
  2c:   e12fff1e

so in terms of "storage" of the source code 5 is cheaper than 05 is cheaper than 0x5 but 0x7B is the same as 0173 but 123 is cheaper. hex becomes cheapest as the numbers get larger (obviously it has a higher base 16 vs 8 vs 10).

Are you really that desperate for storage space of the source code? You need to be a tab person instead of a space person then. And use short variable names and function names. My long answer has probably filled up all of your ram.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • bits is bits, the "computer" doesnt know an address from a constant from ASCII they are just bits. For the duration of an instruction some of the bits are for example an address. But sometimes those same bits are not an address but an operand: ptr++; The bits mean something to the humans, the computer could care less. ASCII means absolutely nothing to the computer, the closest you might come to it having some relevance is an offset into a table of bits that humans would interpret as pixels but the computer those are just bits. – old_timer Feb 25 '18 at 06:25
  • It is like asking the brick if it knows it is part of a library or another one if it knows it is part of a house. In the grand scheme of things it is just a brick. To a human a collection of them becomes a house, but they are really just individual bricks. – old_timer Feb 25 '18 at 06:28
  • Instructions vs data is the same issue, this tends to blow away folks that understand that everything else is just bits. How does the cpu tell instructions from data? IT DOESNT, it processes the bits that are fed to it. It is up to the humans to feed it the right bits, AND DATA ENTERS THE PROCESSOR, it goes into the pipe, there are lots of constants and addresses mixed in with instructions in a binary these go into the pipe but the humans have placed them such that they dont get executed (hopefully). The bits only have relevance as instructions for a short number of clock cycles. – old_timer Feb 25 '18 at 06:32
  • For addition and subtraction the cpu doesnt know signed from unsigned, that is only relevant to the human. Multiplication yes signed vs unsigned matters but the compiler (where variable types exist (in the language)) chooses the correct multiply or divide (sometimes signed vs unsigned doesnt matter with multiply or divide as well). Will leave this (as with other exercises) up to the reader to figure out. – old_timer Feb 25 '18 at 06:35