175

I know that the C and C++ standards leave many aspects of the language implementation-defined just because if there was an architecture with other characteristics, a standard confirming compiler for that architecture would need to emulate those parts of the language, resulting in inefficient machine code.

Surely, 40 years ago every computer had its own unique specification. However, I don't know of any architectures used today where:

  • CHAR_BIT != 8
  • signed is not two's complement (I heard Java had problems with this one).
  • Floating point is not IEEE 754 compliant (Edit: I meant "not in IEEE 754 binary encoding").

The reason I'm asking is that I often explain to people that it's good that C++ doesn't mandate any other low-level aspects like fixed sized types. It's good because unlike 'other languages' it makes your code portable when used correctly (Edit: because it can be ported to more architectures without requiring emulation of low-level aspects of the machine, like e.g. two's complement arithmetic on sign+magnitude architecture). But I feel bad that I cannot point to any specific architecture myself.

So the question is: what architectures exhibit the above properties?

uint*_ts are optional.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • 9
    I think you have it backwards. If the C++ was to mandate, say, twos complement for signed integers, it would made C++ code more portable not less. The question of why the C++ standards committee don't mandate this is another matter. Especially as, despite what you say, it wouldn't be _impossible_ to write a compiler for a non-standard architecture, you can always simulate 8 bit chars or twos complement arithmetic even when your platform doesn't support it directly. – john Aug 07 '11 at 09:42
  • 12
    @john: then it would be impractical so non-standard conforming compiler would generate faster code than a conforming one. And I still don't see how it make your code more portable. – Yakov Galka Aug 07 '11 at 09:46
  • 1
    Well I would say that would a problem for the exotic architectures, but compilers on such a platform could have options to generate standard or non-standard code. C++ would be more portable because code that assumes, say, twos complement arithmetic would now be covered by the standard instead of being technically incorrect. – john Aug 07 '11 at 09:49
  • 4
    I'm sure the real reason for the standard being so is not because it's some ideal solution. But instead it's because when the standard was written many C and C++ compilers already existed, and the standards committee didn't want to reject existing compilers. – john Aug 07 '11 at 09:54
  • 3
    @john, rejecting compilers is easier than rejecting hardware. When it comes to the hardware, if the standard guarantees some aspect of implementation, it doesn't increase portability. On the contrary, it makes your program 'conforming' even if it can't be compiled&run on some platform. Today a C++ standard conforming program can run on a greater number of platform than if the standard *restricted* the kind of hardware you can use. – Yakov Galka Aug 07 '11 at 10:01
  • 3
    You're talking about the portability of the language, I'm talking about the portability of code. Leaving aspects of the language unspecified makes it easier to write compilers but harder to write compliant code. In my view most language standards are too easy on the compiler writers (that's probably because the committee members are mostly compiler writers), especially as the difficulties posed by non-standard architectures don't seem that great to me. – john Aug 07 '11 at 10:30
  • 5
    @john : I doubt that "making it easier for compiler writers" is a priority when creating the C++ standard (they'd be doing an awful job if it were, since C++ is one of the hardest languages to parse, and other aspects of the language don't exactly make it easy for compiler writers either). Performance, wide platform support, and backward compatibility are quite important though. And all those three would suffer if the restriction(s) you mention would be added to the standard. – Sander De Dycker Aug 07 '11 at 11:15
  • 5
    It is not about the compiler but the hardware. C++ leaves some things unspecified to allow direct use of the hardware features. Your phone apps won't run on a mainframe anyway, so there is no portability however conformant the code is. – Bo Persson Aug 07 '11 at 11:30
  • 1
    @Bo "phone apps won't run on a mainframe anyway" not sure. If you stick to the parts of the standard that are guaranteed to be portable, then your code *will* be portable (and you *can* write quite much with this subset of C++). The point is that the standard does not lie by saying that something is standard but in fact it's not. – Yakov Galka Aug 07 '11 at 13:42
  • @ybungalobill - Code can be portable, but many applications are not. If you turn it the other way - The mainframe app doesn't work on the phone, because you cannot connect the 10.000 terminals it uses. But that's ok, if we are just allowed to write C++ code on either. And we can, exactly because the language standard leaves some parts open. If you want to run Java on the same mainframe you have to buy extra hardware, because that standard **did** specify some low level details. – Bo Persson Aug 07 '11 at 13:59
  • 1
    @BoPersson: Historically, the expected thing for compilers to do in conditions resulting where the standard imposed no requirements was to either generate the platform's "natural" code for an action and let whatever happened, happened, or else substitute a *more useful* behavior. Unfortunately, in an effort to allow more "optimizations", compiler writers have recently decided to reverse some very long-standing precedents in ways that will result in code which is harder to read, takes longer to compile, and will be less efficient than code which could use platform behavior. – supercat May 14 '15 at 19:12
  • @BoPersson: I also find it curious that compiler authors decided to target constructs which had long-established useful behaviors on most "normal" architectures (e.g. left-shifting a negative value, or relational comparisons between unrelated pointers) rather than seeking to ease requirements which needlessly impair many useful optimizations (e.g. allowing an `int16_t` whose address is never taken to be replaced with `int32_t`). – supercat May 14 '15 at 19:17

7 Answers7

131

Take a look at this one

Unisys ClearPath Dorado Servers

offering backward compatibility for people who have not yet migrated all their Univac software.

Key points:

  • 36-bit words
  • CHAR_BIT == 9
  • one's complement
  • 72-bit non-IEEE floating point
  • separate address space for code and data
  • word-addressed
  • no dedicated stack pointer

Don't know if they offer a C++ compiler though, but they could.


And now a link to a recent edition of their C manual has surfaced:

Unisys C Compiler Programming Reference Manual

Section 4.5 has a table of data types with 9, 18, 36, and 72 bits.

size and range of data types in USC C compiler

Peter Kühne
  • 3,224
  • 1
  • 20
  • 24
Bo Persson
  • 90,663
  • 31
  • 146
  • 203
  • 16
    I guess void* must be hellish to use in that architecture. – luiscubal Aug 07 '11 at 12:19
  • 3
    The reason for the pointer sizes is that it is word addressed (another difference :) and accessing individual chars needs extra tricks. – Bo Persson Aug 07 '11 at 12:23
  • 3
    Is `sizeof(char*) != sizeof(int*)` even standard? – Yakov Galka Aug 07 '11 at 13:13
  • 16
    @ybungalobill - I believe `char*` and `void*` must be the same size, and large enough to hold any other pointer. The rest is up to the implementation. – Bo Persson Aug 07 '11 at 13:26
  • 28
    @ybungalobill: On old Win16 compilers, regular pointers were near pointers and contained just a 16-bit offset, so `sizeof(int*) == 2`, but far pointers also had a 16-bit selector, so `sizeof(void*) == 4`. – Adam Rosenfield Aug 07 '11 at 16:09
  • 12
    There is, or used to be, an on-line manual for their C++ compiler. It's also worth pointing out that this is just one of the Unisys mainframe architectures: the other is a 48 bit signed magnitude tagged architecture (for which I've only found a C manual, not a C++ one). Concerning the rest: I don't think that `sizeof(int*) != sizeof(char*)` here: both are 36 bits. But the byte selector in the `char*` is on the high order bits, and is ignored in `int*`. (I've used other machines, however, where `sizeof(char*) > sizeof(int*).) – James Kanze Aug 07 '11 at 22:51
  • 17
    @Adam Rosenfield On the MS/DOS 16 bit compilers, you had different "modes", and data pointers weren't necessarily the same size as function pointers. But at least on the ones I used, all data pointers (including `void*`) always had the same size. (Of course, you couldn't convert a function pointer to `void*`, since `void*` might be smaller. But according to the standard, you can't do that today, either.) – James Kanze Aug 07 '11 at 22:54
  • @James - The `char*` being different is my remembrance from how I once learned to do some string processing in assembly. It was a long time ago, and there might have been other (better) ways. :-) – Bo Persson Aug 08 '11 at 04:03
  • 3
    @Bo Persson You've actually programmed on this machine? I'm remembering what I read in the C++ manual that I downloaded a couple of years ago. My memory is fallible, so I could easily be wrong. But from what I've seen on other word addressed machines: if the word contains more bits than are needed for addressing (and these machines date from an epoch when 24 bits was largely sufficient for addressing), the additional byte selector is put into the high order bits, rather than in a separate word. – James Kanze Aug 08 '11 at 08:03
  • 2
    @James - Just a little bit, and a very long time ago. My university had a Univac 1100 and I learned the architecture there. C++ wasn't available (at all!) at the time so I don't know for sure what that would look like. – Bo Persson Aug 08 '11 at 08:18
  • 1
    @Bo A couple of years ago, at least, Unisys had the manuals for all of its machines on line; I downloaded the C++ manual for the 2200, and the C manual for the MPS (ex-Burroughs---I wasn't able to find a C++ manual, so I suspect that it didn't support C++). But I didn't save them. It's possible that `char*` was larger than other data pointers, but I don't remember seeing this in the manual; I think I'd remember if I'd seen it, but I may not have looked in the right places. (It was only from curiousity.) – James Kanze Aug 08 '11 at 08:51
  • 3
    @Calmarius Historical reasons. – Wiz May 30 '13 at 18:19
  • 1
    The Unisys manual I just looked at showed a 71-bit "signed long long" type, but no unsigned type longer than 36 bits. As such, it would not be an example of a non-two's-complement C99 compiler. Unless someone has upgraded a compiler for a non-two's-complement platform to be C99 compliant, it seems silly to have C standards accommodate platforms upon which future versions of C aren't supportable anyway. – supercat Nov 10 '16 at 19:47
  • @BoPersson [C++20 mandates only two's complement for signed integrals](https://stackoverflow.com/q/57363324/183120). Does that mean Unisys servers with one's complement won't have C++20 compilers? – legends2k Sep 12 '19 at 12:59
  • This is one of those cases where I think Unisys should just provide non-standard extensions in their C implementation to support the special types, and have "normal" types act like everywhere else. Having some Unisys dinosaur juice leak into a language standard is sheer insanity, IMHO. – Kuba hasn't forgotten Monica Feb 28 '20 at 17:09
  • 1
    @legends2k no, they already used a 71-bit type so they can easily drop 1 bit like that to make a two's complement integer type in C++20 (unless C++20 doesn't allow padding bits or trap representations, I haven't checked that), or the compiler can use manual bit manipulation to use the full 72-bit range with some performance tradeoff. You can check their documentation and see *Note: If the `CONFORMANCE/TWOSARITH` or `CONFORMANCE/FULL` compiler keywords are used, the range will be 0 to (2³⁶)-1* – phuclv Oct 06 '22 at 03:46
58

None of your assumptions hold for mainframes. For starters, I don't know of a mainframe which uses IEEE 754: IBM uses base 16 floating point, and both of the Unisys mainframes use base 8. The Unisys machines are a bit special in many other respects: Bo has mentioned the 2200 architecture, but the MPS architecture is even stranger: 48 bit tagged words. (Whether the word is a pointer or not depends on a bit in the word.) And the numeric representations are designed so that there is no real distinction between floating point and integral arithmetic: the floating point is base 8; it doesn't require normalization, and unlike every other floating point I've seen, it puts the decimal to the right of the mantissa, rather than the left, and uses signed magnitude for the exponent (in addition to the mantissa). With the results that an integral floating point value has (or can have) exactly the same bit representation as a signed magnitude integer. And there are no floating point arithmetic instructions: if the exponents of the two values are both 0, the instruction does integral arithmetic, otherwise, it does floating point arithmetic. (A continuation of the tagging philosophy in the architecture.) Which means that while int may occupy 48 bits, 8 of them must be 0, or the value won't be treated as an integer.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • 5
    IBM mainframes (z/Architecture) do support IEE754 floating point. – Nikita Nemkin Jun 01 '16 at 09:47
  • 1
    fyi see [this twitter comment](https://twitter.com/stephentyrone/status/798216634645250049) – Shafik Yaghmour Nov 14 '16 at 18:59
  • 6
    @Nikita - They do *now*. Initially it was an (expensive) add-on to support Java. – Bo Persson Mar 29 '17 at 16:14
  • 1
    [ClearPath Enterprise Servers C Programming Reference Manual](http://public.support.unisys.com/aseries/docs/ClearPath-MCP-18.0/86002268-207.pdf). And [here's the Burroughs sign-magnitude number format](https://books.google.com.vn/books?id=j_0QDgAAQBAJ&pg=PA116&dq=Burroughs+B6700+Format&hl=en&sa=X&ved=0ahUKEwjihZ2xt_HbAhWCQpQKHfA9ACIQ6AEIKTAA#v=onepage&q=Burroughs%20B6700%20Format&f=false) – phuclv Jun 26 '18 at 13:35
44

Full IEEE 754 compliance is rare in floating-point implementations. And weakening the specification in that regard allows lots of optimizations.

For example the subnorm support differers between x87 and SSE.

Optimizations like fusing a multiplication and addition which were separate in the source code slightly change the results too, but is nice optimization on some architectures.

Or on x86 strict IEEE compliance might require certain flags being set or additional transfers between floating point registers and normal memory to force it to use the specified floating point type instead of its internal 80bit floats.

And some platforms have no hardware floats at all and thus need to emulate them in software. And some of the requirements of IEEE 754 might be expensive to implement in software. In particular the rounding rules might be a problem.

My conclusion is that you don't need exotic architectures in order to get into situations were you don't always want to guarantee strict IEEE compliance. For this reason were few programming languages guarantee strict IEEE compliance.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • 1
    Indeed the 80 bits internal representation play havoc with optimization settings. Not getting the same output between a debug and a release build is really unnerving... – Matthieu M. Aug 07 '11 at 11:11
  • 7
    Another "exotic" set of hardware is IBM mainframes where the floating point format predates the IEEE standard. Unlike Java, C++ can still use the existing hardware. – Bo Persson Aug 07 '11 at 11:26
  • 6
    IEEE 754 is not fully supported by GPUs. – kerem Aug 07 '11 at 11:54
  • 3
    The lack of strict compliance to IEEE 754 is a bother to some, but I don't think is quite in the scope of the issues that the OP really cares about. – Omnifarious Aug 07 '11 at 17:01
  • 1
    Modern IBM mainframes support both IBM native floating point (base 16) and IEEE. But at least when I last looked, the support for IEEE was optional, and a lot slower. – James Kanze Aug 08 '11 at 08:47
  • 3
    @Matthieu Since this is also tagged "C", I should mention a C analyzer that can tell you all the values your floating-point program may take with 80 bits floating-point registers spilled to memory at the C compiler's whim. http://blog.frama-c.com/index.php?post/2011/03/03/cosine-for-real – Pascal Cuoq Aug 08 '11 at 17:18
  • 2
    @MatthieuM.: It's too bad ISO/ANSI didn't allow variadic parameters to specify minimum/maximmum sizes for floating-point and integer arguments; if they had, the 80-bit `long double` could have been a useful and long-lived type, since the one real problem with it was that it works badly with `printf`. The fact that the extended double stores the leading 1 explicitly speeds up calculations on non-FPU systems and would also eliminate the need for special handling of denormals in any context other than conversions to/from other types. Too bad C's `printf` messed everything up. – supercat Apr 30 '15 at 16:32
  • 1
    [Do any real-world CPUs not use IEEE 754?](https://stackoverflow.com/q/2234468/9957140) – phuclv May 25 '18 at 06:37
44

I found this link listing some systems where CHAR_BIT != 8. They include

some TI DSPs have CHAR_BIT == 16

BlueCore-5 chip (a Bluetooth chip from Cambridge Silicon Radio) which has CHAR_BIT == 16.

And of course there is a question on Stack Overflow: What platforms have something other than 8-bit char

As for non two's-complement systems there is an interesting read on comp.lang.c++.moderated. Summarized: there are platforms having ones' complement or sign and magnitude representation.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
dcn
  • 4,389
  • 2
  • 30
  • 39
  • 6
    Analog Devices 32-bit SHARC DSP has `CHAR_BIT=32`, and Texas Instruments DSP from TMS32F28xx has `CHAR_BIT=16`. GCC 3.2 for PDP-10 has `CHAR_BIT=9`. I think, S/360 may have a not-8bit char too. – osgx Aug 07 '11 at 09:52
  • 2
    I still would like an example for 'non two's complement' architectures. Especially since it happened that the `CHAR_BITS` is a partial duplicate. – Yakov Galka Aug 07 '11 at 10:03
  • TI DSPs have 16-bit chars only because the implementers chose it (it'd be a bit more work to get it to work right, but not absurdly hard IIRC - probably just some "holes" in the codegen scaffolding in the underlying compiler). So it's not some deep architectural reason. C code works on an abstract machine. If all you have is 16-bit INTs, store two chars in each, and add read-modify-write merging to the peephole optimizer (at the very least). Sure, it's more work, but just look at how much more work is for everyone to deal with such odd types in places where they won't ever show up. Yuck. – Kuba hasn't forgotten Monica Feb 28 '20 at 17:14
28

I'm fairly sure that VAX systems are still in use. They don't support IEEE floating-point; they use their own formats. Alpha supports both VAX and IEEE floating-point formats.

Cray vector machines, like the T90, also have their own floating-point format, though newer Cray systems use IEEE. (The T90 I used was decommissioned some years ago; I don't know whether any are still in active use.)

The T90 also had/has some interesting representations for pointers and integers. A native address can only point to a 64-bit word. The C and C++ compilers had CHAR_BIT==8 (necessary because it ran Unicos, a flavor of Unix, and had to interoperate with other systems), but a native address could only point to a 64-bit word. All byte-level operations were synthesized by the compiler, and a void* or char* stored a byte offset in the high-order 3 bits of the word. And I think some integer types had padding bits.

IBM mainframes are another example.

On the other hand, these particular systems needn't necessarily preclude changes to the language standard. Cray didn't show any particular interest in upgrading its C compiler to C99; presumably the same thing applied to the C++ compiler. It might be reasonable to tighten the requirements for hosted implementations, such as requiring CHAR_BIT==8, IEEE format floating-point if not the full semantics, and 2's-complement without padding bits for signed integers. Old systems could continue to support earlier language standards (C90 didn't die when C99 came out), and the requirements could be looser for freestanding implementations (embedded systems) such as DSPs.

On the other other hand, there might be good reasons for future systems to do things that would be considered exotic today.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 7
    Good point at the end about how overly strict standards prevent innovation. When we get quantum (or organic) computers with trinary states, the modulo arithmetic requirements for `unsigned` integral types will be a major pain, while signed arithmetic will be just fine. – Ben Voigt Aug 27 '13 at 16:17
  • @BenVoigt Why is that unsigned arithmetics a pain? Isn't modulo 3^n adders in those computers not possible? – phuclv Feb 28 '15 at 09:13
  • 2
    @LưuVĩnhPhúc: That's exactly the point, with hardware operations performed modulo 3\*\*n, providing C++ unsigned types whose operations are defined modulo 2\*\*n will be difficult. – Ben Voigt Feb 28 '15 at 18:24
  • 3
    I know of one VAX 11/780 still in use as a host for a cross compiler targeting a specialised embedded system with a proprietary architecture. To sustain that particular VAX, the custodians have been approaching museums for spares. – Peter Oct 15 '16 at 12:49
  • @Peter: Interesting. There are newer models of VAXen. There are also Alpha systems running VMS; I wonder if the cross-compiler could run on one of them. – Keith Thompson Oct 15 '16 at 19:16
  • 2
    @Keith - technically, the only obstacle is going through a process to provide evidence that will satisfy regulatory requirements, since the target embedded system is high criticality. There are a bunch of non-technical obstacles (organisational politics, etc), however, that to date have been insurmountable. Currently it is easier to mount a case to raid museums than to update the host. – Peter Oct 16 '16 at 01:18
  • IIRC Unicos on the Cray J90 was also fun for having C sizeof(int) == 8, but INT_MAX equal to something smaller like 2^52-1 (or was it 2^46-1?) as that let int be fast, done within the floating point hardware. Full 64-bit long's were a slower software emulated operation? – gps Oct 14 '21 at 09:08
19

CHAR_BITS

According to gcc source code:

CHAR_BIT is 16 bits for 1750a, dsp16xx architectures.
CHAR_BIT is 24 bits for dsp56k architecture.
CHAR_BIT is 32 bits for c4x architecture.

You can easily find more by doing:

find $GCC_SOURCE_TREE -type f | xargs grep "#define CHAR_TYPE_SIZE"

or

find $GCC_SOURCE_TREE -type f | xargs grep "#define BITS_PER_UNIT"

if CHAR_TYPE_SIZE is appropriately defined.

IEEE 754 compliance

If target architecture doesn't support floating point instructions, gcc may generate software fallback witch is not the standard compliant by default. More than, special options (like -funsafe-math-optimizations witch also disables sign preserving for zeros) can be used.

Community
  • 1
  • 1
ivaigult
  • 6,198
  • 5
  • 38
  • 66
  • 3
    upvoted for simply directing the OP to look at the source of a popular compiler; this is the definition of RFTM in this case, so it should be the first place people look. – underscore_d Oct 29 '17 at 20:12
12

IEEE 754 binary representation was uncommon on GPUs until recently, see GPU Floating-Point Paranoia.

EDIT: a question has been raised in the comments whether GPU floating point is relevant to the usual computer programming, unrelated to graphics. Hell, yes! Most high performance thing industrially computed today is done on GPUs; the list includes AI, data mining, neural networks, physical simulations, weather forecast, and much much more. One of the links in the comments shows why: an order of magnitude floating point advantage of GPUs.

Another thing I'd like to add, which is more relevant to the OP question: what did people do 10-15 years ago when GPU floating point was not IEEE and when there was no API such as today's OpenCL or CUDA to program GPUs? Believe it or not, early GPU computing pioneers managed to program GPUs without an API to do that! I met one of them in my company. Here's what he did: he encoded the data he needed to compute as an image with pixels representing the values he was working on, then used OpenGL to perform the operations he needed (such as "gaussian blur" to represent a convolution with a normal distribution, etc), and decoded the resulting image back into an array of results. And this still was faster than using CPU!

Things like that is what prompted NVidia to finally make their internal data binary compatible with IEEE and to introduce an API oriented on computation rather than image manipulation.

Michael
  • 5,775
  • 2
  • 34
  • 53
  • 1
    @ybungalobill, offloading repetitive work to GPU is currently *the preferred* method for [large scale computations](https://blogs.nvidia.com/blog/2016/09/28/gtc-europe-keynote/?mkt_tok=eyJpIjoiWVRVek1tRmxNbUppT0RNeiIsInQiOiJ1VnpGSjBvOGtYcm9NSFp6V3lVdFRmUXZXUWJJb2k4TG9yeE5pXC9WMEl0TU9uQjNvNjBLXC9UK242NTc1XC9lQndxVzlKTzFTbjdLb0lzZjBtVUxJSVJ1YmppamdcL0RXT0JHWWh3TkVnTGFrMU09In0%3D). In fact, I am currently developing one in C++. Fortunately, we only work with NVidia [CUDA](http://docs.nvidia.com/cuda/index.html#axzz4N5KlOXmv) GPUs that have IEEE 754 compatible binary representation of floats. – Michael Oct 14 '16 at 18:24
  • 6
    @ybungalobill: several answers to that. First, [CUDA does support C, C++, and Fortran](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4N5KlOXmv). See the same link for the humongous performance advantage of 2048-thread GPUs over your typical 8-thread CPU. Second, true, only subsets (although large ones) of those languages are supported, including lack of support for appropriate for CUDA programming model recursion (called "dynamic parallelism") until CUDA 5.0. Third, recursions can usually be replaced by loops, which is necessary for multithreaded perfomance anyway. – Michael Oct 14 '16 at 20:22