Binary representation in C

Question

In C why is there no standard specifier to print a number in its binary format, sth like %b. Sure, one can write some functions /hacks to do this but I want to know why such a simple thing is not a standard part of the language.

Was there some design decision behind it? Since there are format specifiers for octal %o and %x for hexadecimal is it that octal and hexadecimal are somewhat "more important" than the binary representation.

Since In C/C++ one often encounters bitwise operators I would imagine that it would be useful to have %b or directly input a binary representation of a number into a variable (the way one inputs hexadecimal numbers like int i=0xf2 )

Note: Threads like this discuss only the 'how' part of doing this and not the 'why'

Why print binary if you can print hex? It's basically the same thing, but at 400% density. Just translate, `0 = 0000`, `1 = 0001`, `2 = 0010`, ..., `F = 1111`. — Kerrek SB, Dec 04 '11 at 01:58
@KerrekSB - Just because it's not that hard to do yourself isn't a compelling reason to exclude it from the language. It's a pain to have to switch from writing to a stream to writing to a string buffer, translate, and then write to a stream. — Ted Hopp, Dec 04 '11 at 02:02
@TedHopp: Once could argue the other way would why it *should* be needed. You could ask for all sorts of stuff to be included (base-4? base-64?), but ultimately there's no right or wrong answer. The designers probably just didn't think it was necessary. I imagine the figured that anyone who was in the business of binary manipulation would be sufficiently versed in hex... — Kerrek SB, Dec 04 '11 at 02:05
I reckon the reason there's no binary print format is the same as the reason there's no binary literals. But I don't know what that reason is, unless it's the obvious "nobody actually uses ASCII binary, because everyone can read hex just as well as they can read binary, perhaps better". That's "obvious" in the sense, "true of the people who designed the language, not necessarily true of every human being in the world". But the language wasn't designed for every human being in the world. The workaround is "learn to read hex", and learn hex `!`/`&`/`|` like you learned the multiplication tables. — Steve Jessop, Dec 04 '11 at 02:22

score 3 · Accepted Answer · answered Dec 04 '11 at 02:07

3

The main reason is 'history', I believe. The original implementers of printf() et al at AT&T did not have a need for binary, but did need octal and hexadecimal (as well as decimal), so that is what was implemented. The C89 standard was fairly careful to standardize existing practice - in general. There were a couple of new parts (locales, and of course function prototypes, though there was C++ to provide 'implementation experience' for those).

You can read binary numbers with strtol() et al; specify a base of 2. I don't think there's a convenient way of formatting numbers in different bases (other than 8, 10, 16) that is the inverse of strtol() - presumably it should be ltostr().

answered Dec 04 '11 at 02:07

Jonathan Leffler

730,956
141
904
1,278

3

"The original implementers of printf() et al at AT&T did not have a need for binary, but did need octal and hexadecimal (as well as decimal), so that is what was implemented." - Do you have any proof for this claim? You could equally say that they simply wanted us to have a hard time when debugging bit operations. "I don't think there's a convenient way of formatting numbers in different bases" - Why do you think that? Again, you claim things without any explanation. – kol Dec 04 '11 at 02:16
If the implementers had needed binary output, it would have been provided. It wasn't provided; it is reasonable to infer they did not need it. The 7th Edition UNIX manual (which is available online) does not have any support for converting strings of binary digits to integers. As for 'formatting numbers in different bases', I refer you to the C standard - is there a function in the standard that allows you to format a number in base 2 or base 36? I don't think so. If you want to cite a counter-example, name the function; it is easy to provide proof that I'm wrong if I'm wrong. – Jonathan Leffler Dec 04 '11 at 02:24
Remember: UNIX was written by the people who used the system. They provided themselves (and us) with the tools that met their requirements. – Jonathan Leffler Dec 04 '11 at 02:27
@Jonathan: hmm, `itoa` is a pretty common extension. So, while it didn't make the standard I think you might be making too much of that fact. If it existed at AT&T, then it's false to claim that AT&T didn't need it. So, did it exist at AT&T, and which was the first UNIX to include it or something like it? I don't think actually it is true as you suggest that everything needed at AT&T made it into the C standard. – Steve Jessop Dec 04 '11 at 02:30
`itoa()` was not part of the SVID (System V Interface Definition); it is not part of POSIX 2008. That means that it is not a 'standard' extension, regardless of how common it is. (I found a reference to Microsoft deprecating this POSIX function, which is odd because it was not listed in POSIX 2004 or POSIX 1997 either.) – Jonathan Leffler Dec 04 '11 at 04:37

score 3 · Answer 2 · answered Dec 04 '11 at 02:13

You ask "why" as if there must be a clear and convincing reason, but the reality is that there is no technical reason for not supporting a %b format.

K&R C was created be people who framed the language to meet what they thought were going to be their common use cases. An opposing force was trying to keep the language spec as simple as possible.

ANSI C was standardized by a committee whose members had diverse interests. Clearly %b did not end-up being a winning priority.

Languages are made by men.

tobyodavies · Answer 3 · 2012-10-19T02:38:59.737

2

The main reason as I see it is what binary representation should one use? one's complement? two's complement? are you expecting the actual bits in memory or the abstract number representation?

Only the latter makes sense when C makes no requirements of word size or binary number representation. So since it wouldn't be the bits in memory, surely you would rather read the abstract number in hex?

Claiming an abstract representation is "binary" could lead to the belief that -0b1 ^ 0b1 == 0 might be true or that -0b1 | -0b10 == -0b11

Possible representations:

While there is only one meaningful hex representation --- the abstract one, the number -0x79 can be represented in binary as:

-1111001 (the abstract number)
11111001 (one's complement)
10000111 (two's complement)

@Eric has convinced me that endianness != left-to-right order...

~~the problem is further compounded when numbers don't fit in one byte. the same number could be:~~

~~1000000001111001 as a one's-complement big-endian 16bit number~~
~~1111111110000111 as a two's-complement big-endian 16bit number~~
~~1000011110000000 as a one's-complement little-endian 16bit number~~
~~1000011111111111 as a two's-complement little-endian 16bit number~~

The concepts of endianness and binary representation don't apply to hex numbers as there is no way they could be considered the actual bits-in-memory representation.

All these examples assume an 8-bit byte, which C makes no guarantees of (indeed there have been historical machines with 10 bit bytes)

Why no decision is better than any decision:

Obviously one can arbitrarily pick one representation, or leave it implementation defined. However:

if you are trying to use this to debug bitwise operations, (which I see as the only compelling reason to use binary over hex) you want to use something close what the hardware uses, which makes it impossible to standardise, so you want implementation defined.
Conversely if you are trying to read a bit sequence, you need a standard, not implementation defined format.
And you definitely want printf and scanf to use the same.

So it seems to me there is no happy medium.

edited Oct 19 '12 at 02:38

answered Dec 04 '11 at 05:25

tobyodavies

27,347
5
42
57

1

How is "which binary representation should I use" any different to "which hex representation should I use"? – Eric Oct 18 '12 at 19:18
@Eric because there is only one hex representation --- the abstract one. see my edit – tobyodavies Oct 18 '12 at 22:52
Endianess is not related to the concepts of right or left - you can choose whatever order you like to display the bits. If I write `0xAACC`, that is unambiguously `0b1010101011001100`. I still don't follow what makes binary in a different class to hex, when there's a 1 to 1 mapping between them. – Eric Oct 18 '12 at 23:04
What about `-0xAACC`? the difference is that binary is both an abstract numeric representation and data representation, as I stated in the first paragraph: "are you [OP] expecting the actual bits in memory or the abstract number representation?" – tobyodavies Oct 18 '12 at 23:46
Obviously one can arbitrarily pick one, but if you are trying to use this to debug bitwise operations, you want to use something close what the hardware uses (in terms of complement representation at least), which makes it impossible to standardise. Conversely if you are trying to read a bit sequence, you need a standard, not implementation defined format. And you definitely want `printf` and `scanf` to use the same. So it seems to me there is no happy medium – tobyodavies Oct 18 '12 at 23:54
1

Besides, my argument is that the only reason to prefer `bin` over `hex` is if you want to see the bits in memory. Since this is impossible to standardise, you'd surely rather see `hex`? i.e. we are kind of arguing the same thing: that only the abstract representation makes sense, but claiming it's "binary" could lead to the belief that `-0xb1 ^ 0xb1 == 0` might be true – tobyodavies Oct 19 '12 at 00:01
Writing hex with a leading `-` is just strange. If I run `printf("%x", -1)` I get `ffffffff`. I would expect `%b` to behave in a similar way as `%x`. _"the difference is that binary is both an abstract numeric representation and data representation"_ - And hex isn't? – Eric Oct 19 '12 at 07:09
@Eric hex is not what electrons stored in transistors in RAM chips represent, binary is. The code you give is actually undefined behaviour --- on a ones-complement system that would print `fffffffe` without a typecast (i.e. `printf("%x", (unsigned) -1)` yeilds `ffffffff`). Obviously you realize that `unsigned x = 0xFFFFFFFF` is not the same as `int x = -0x1`, just that they happen to share a binary (bit) *representation* on the most common systems (twos-complement 32-bit systems). However `int x = 0b1` *is necessarily identical* to `int x = 0x1` in any system, as these are the same number. – tobyodavies Oct 21 '12 at 07:39
@Eric my point being that `0xFFFFFFFF == -0x1` is a coincidence of binary (bit) representation even though no binary numbers are involved. Whereas `-0x1 == -0b1` is necessarily true. This is the sense in which binary is distinct from hex. – tobyodavies Oct 21 '12 at 07:48
Still not following you. `0b11111111 == -0b1` is also a coincidence of internal representation. Heck, `255 == -1` is as well. You're not giving me anything to distinguish hex from binary. – Eric Oct 21 '12 at 08:49
`0b11111111 == -0b1` is a consequence of *binary* representation, not a consequence of *hex* representation. It is precisely the fact that *binary representation* is a synonym for *internal representation*. internally things are in one of several binary representation. not a hex representation. – tobyodavies Oct 21 '12 at 09:19

score 1 · Answer 4 · answered Dec 04 '11 at 02:04

One answer may be that hexadecimal formatting is much more compact. See for example the hexa view of Total Commander's Lister.

%b would be useful in lots of practical cases. For example, if you write code to analyze network packets, you have to read the values of bits, and if printf would have %b, debugging such code would be much easier. Even if omitting %b could be explained when printf was designed, it was definitely a bad idea.

Guest-11 · Answer 5 · 2016-04-13T20:15:16.620

I agree. I was a participant in the original ANSI C committee and made the proposal to include a binary representation in C. However, I was voted down, for some of the reasons mentioned above, although I still think it would be quite helpful when doing, e.g., bitwise operations, etc.

It is worth noting that the ANSI committee was for the most part composed of compiler developers, not users and C programmers. Their objectives were to make the standard understandable to compiler developers not necessarily for C programmers, and to be able to do so with a document that was no longer than it need be, even if this meant it was a difficult read for C programmers.

Binary representation in C

5 Answers5