12

[C++11: 1.7] talks about bytes in terms of bits:

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

However, I cannot find anywhere in the standard that defines "bit".

So is it true to say that C++ does not place limitations on the number of values that may be represented by a single bit?

Does it allow, say, tri-state bits?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 2
    We have some 'indirect definition' in `bitset`: "Each bit represents either the value zero (reset) or one (set)." (20.5._0_.3). You could say that it is just standard library class, but _at lest this part of c++ requires 2-states bit_ – Lol4t0 Sep 29 '12 at 20:15
  • @Lol4t0: ooh, that's interesting! – Lightness Races in Orbit Sep 29 '12 at 20:32

3 Answers3

6

Among the normative references listed in [C++11: 1.2] is "ISO/IEC 9899:1999, Programming languages — C".

In turn, this standard says:

[C99: 3.5]: 1 bit unit of data storage in the execution environment large enough to hold an object that may have one of two values

This doesn't preclude a bit being a unit of data storage that's even larger, so C++ as a language indeed could support tri-state bits.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • I'm not sure C++ would accept all the three states of a tri-state-bit. How would you define bit-and `&` or bit-complement `~` in that case. I also believe C would be very hard to port to decimal machines (like the IBM1620 from the early 1960s). – Basile Starynkevitch Sep 29 '12 at 19:43
  • 2
    @Basile: The standard appears to leave the definition of "the one's complement of its operand" (for `~`, `5.3.1/10`) and "the bitwise `AND` function" (`5.11.1`) to the reader. Thus a tri-state bit system could define these and C++ would have to accept this. I suppose it would also require a re-definition of one's complement for the new bit system. – Lightness Races in Orbit Sep 29 '12 at 19:46
  • @BasileStarynkevitch EBCDIC on a s390 is a big enough pain in the tail. I can only imagine the mess a BCD system would stir up. – WhozCraig Sep 29 '12 at 19:47
  • 3
    Out of true curiosity, isn't the "one of *two* values" reasonably approachable to conclude the "of two" infers "only two" ? – WhozCraig Sep 29 '12 at 19:49
  • 5
    In computing, `bit` is unambiguous, unlike a byte. It can only have 2 values, usually denoted by 0 and 1. It's also shorthand for "binary digit". Information theory defines it somewhat different, it's a piece of information that can only have 2 values. – nos Sep 29 '12 at 19:53
  • @WhozCraig: If a bit is "large enough" to hold "an object that may have one of two values" then yes, the definition of _that_ object is fixed. However, no mention is made of a bit not also being allowed to be large enough to hold an object that may have one of _three_ values, where the initial criterion would still be satisfied. – Lightness Races in Orbit Sep 29 '12 at 19:55
  • 1
    @nos: Where is this defined normatively in C++? – Lightness Races in Orbit Sep 29 '12 at 19:56
  • @LightnessRacesinOrbit So that was my question. To be a "bit" as I read that description tells me whatever the backend, it can support two values, and must be set to one of them at all times. As i read it, anything that fills that is a "bit" in the standard sense; anything that doesn't (such as holding more than two) no longer qualifies. I have no idea if that was the intent of the standard wording, but it was what I took from it. – WhozCraig Sep 29 '12 at 20:01
  • @WhozCraig: I would imagine you're right that that would be the fundamental intent of the passage, but I'm being massively pedantic here. – Lightness Races in Orbit Sep 29 '12 at 20:02
  • @LightnessRacesinOrbit We all need to do that from time to time.. step out of the box and hit it with a big'ol'hammer =P – WhozCraig Sep 29 '12 at 20:04
  • 2
    A bit is a binary digit, base 2, nothing else. I'm surprised that either standard defines what a "bit" is. It violates an unstated rule of standards and requirements writing: *You don't define things that have a clear, unambiguous, universal meaning.* Compare with `sqrt`: What does that function do? "The `sqrt` functions compute the nonnegative square root of x." What's this nonnegative square root? It's not defined because everyone knows what it is. It is unambiguous. So is "bit". – David Hammen Sep 29 '12 at 20:05
  • @DavidHammen: `A bit is a binary digit, base 2, nothing else` [citation needed] You're right in that certain mathematical definitions seem to be left unresolved too. Naturally at some point there has to be a level of uncited referencing of common scientific concepts. I'm just not entirely sure that we can decide for ourselves that "bit" must fit into that box. – Lightness Races in Orbit Sep 29 '12 at 20:06
  • @Lightness Races In 3.9.1.7 most likely, as it refers to a bit as a pure binary numeral system. "A positional representation for integers that uses the binary digits 0 and 1 " – nos Sep 29 '12 at 20:11
  • An object cannot be held by anything of size less than one byte in C++, because there is no type that can represent it and in C++ every object has a type (as opposed to objects in C). – Johannes Schaub - litb Sep 29 '12 at 20:13
  • @LightnessRacesinOrbit: I think that a `bit` is an unambiguous in itself. What is defined here is a `bit unit storage`, which could be larger in order to hold other information *on top of the bit*. For example, one could store every bit in a *3 bit large unit storage* and decide the value the storage has by majority vote, to protect against cosmic rays. – Matthieu M. Sep 29 '12 at 20:13
  • 2
    The Harpers Collins Dictionary of Mathematics, **bit** *n. abbrev. for* binary digit. ... Just one of many. A standards committee doesn't define every last word it uses when writing a standard for the simple reason that the standard would never come out. You are playing language lawyer, but you are forgetting that the law (and standards) have an underlying "reasonable person" concept. A trit is not a bit. Saying that a bit is ambiguous, that it might be a trit, violates the reasonable person concept. – David Hammen Sep 29 '12 at 20:13
  • @LightnessRacesinOrbit - "Naturally at some point" -- What's a point? – David Hammen Sep 29 '12 at 20:16
  • 1
    @nos: Aha! That looks promising. – Lightness Races in Orbit Sep 29 '12 at 20:32
  • @MatthieuM.: Hmm, that's an interesting take. Makes sense, too. – Lightness Races in Orbit Sep 29 '12 at 20:32
  • @DavidHammen: A position on the scale of terminology abstractions. Bit being closer to the extreme of "this is just an unambiguously understood concept" than to the extreme of "this, when not defined here, could mean anything". – Lightness Races in Orbit Sep 29 '12 at 20:34
  • @LightnessRacesinOrbit - I was thinking of "point" as it's defined in Euclidean geometry. It's not defined. "Point" is one of the three undefined terms in Euclidean geometry, the other two being "line" and "plane". – David Hammen Sep 29 '12 at 20:36
  • I referenced your answer in this [question](http://stackoverflow.com/questions/23020323/how-do-we-apply-content-not-explicitly-cited-from-the-normative-references-to-th). Surprisingly I could not find many references to section `1.2` in SO question or answers. – Shafik Yaghmour Apr 11 '14 at 18:53
  • I was hoping you might have a comment on [Casey's answer to my question](http://stackoverflow.com/a/23021390/1708801). The answer basically says you can not cite a normative reference unless explicitly cited within the C++ standard. Do you believe this is allowed? His answer sounds reasonable but not one of comments here complains that you are doing so, so it make me wonder if the answer is missing something. – Shafik Yaghmour Apr 14 '14 at 12:04
2

3.9.1.7 says

Types bool, char, wchar_t, and the signed and unsigned integer types are collectively called integral types.48) A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system.49) [ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. — end example ]"

The note 49 reads

A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.)

nos
  • 223,662
  • 58
  • 417
  • 506
  • 7
    Playing language lawyer, examples and notes are not normative. – David Hammen Sep 29 '12 at 20:30
  • 1
    IIRC, In C this note is part of the normative text. can someone verify that? However I don't know whether the normative text "pure binary numeration system." of C++ offers space for any doubts. – Johannes Schaub - litb Sep 29 '12 at 21:02
  • @JohannesSchaub-litb: It's note 40 in C99 – Lightness Races in Orbit Sep 29 '12 at 21:12
  • @DavidHammen Still, 3.9.1.7 specifies the representation to be a binary numeral system - a binary numeral system only have 2 symbols for their digits. And that would mean trits (trinary bits) can not be used, at least as seen from C++ , and it'll mean the integer operators (~, & and so on) operates on those binary digits. – nos Oct 01 '12 at 10:13
  • @nos: §1.9/1 "The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below." So this quote is not enough to prevent a C compiler for a ternary machine. – Mooing Duck Jan 04 '13 at 22:51
  • So `~` and such could be emulated (if slowly) – Mooing Duck Jan 04 '13 at 23:51
  • @MooingDuck Ofcourse. You could build a C++ compiler on top of pretty much anything. But your C++ code wouldn't know that, or make use of the fact that the binary representation was emulated. So from the view of the C++ code, integers are represented by binary no matter the what machinery is in place to provide that. – nos Jan 05 '13 at 00:00
  • @nos: as far as I can find, the only places where stuff acts like binary are: `&`, `^`, `|`, `>>`, `<<`, and unsigned rollover. Everything else appears indifferent. So yes. – Mooing Duck Jan 05 '13 at 00:07
1

I'm going to disagree with the accepted answer, since that is emulatable by a ternary machine, which is expressly allowed by the spec.

§ 3.9.1/4 Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
§ 1.8/5 An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.
§ 3.9/9 Arithmetic types (3.9.1)... are collectively called scalar types. Scalar types, ... arrays of such types... are collectively called POD types. Scalar types ..., arrays of such types... are collectively called trivially copyable types.
§ 3.8/2 For any object... of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

The problem here is that at all points, the state of all trivially copiable multibyte objects must be copiable to an array of char and back without loss. This means that a ternary machine emulating a base 2 machine (as is required by the basic arithmetic types having modulo "rollovers"), must emulate those rollovers from each emulated byte to the next in each and every unsigned multibyte arithmetic operation.

Even this is emulatable on a ternary machine, slowly, but if all primitive types are made of exactly 41 trits than all a compiler has to worry about is unsigned rollover/under, which might be viable. (Obviously, emulating ^, | and & is also slow, but that's less of an issue in my mind)I think it could be done, but is amazingly impracticable to make a standard conforming C++ compiler for a ternary machine.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158