signed as default in C

Question

Once again, I'm teaching a class where I get to answer students questions about C. Here's one I don't know the answer to: Was there a rationale behind accepting signed as the default modifier for C? One would have thought unsigned was the natural choice. So, was this really a design decision?

The "rationale" itself isn't totally correct. For plain `char`, it's not always `signed`. — Eric Z, Sep 04 '13 at 01:28
Why would unsigned int be more natural? I think most real world problems deal with both positive and negative values. — jxh, Sep 04 '13 at 01:30
@jxh More to the point, most real world problems deal with _small_ numbers - that is, numbers relatively near 0. I think most people in most cases are far more likely to need numbers below (or at least near) 0 then they are to need numbers greater than (or even near to) MAX_INT. Signed numbers keep both the upper and lower bounds as far away from the most commonly used numbers as possible. — Darrel Hoffman, Sep 04 '13 at 04:09

score 15 · Accepted Answer · edited Jun 20 '20 at 09:12

15

In terms of the standard (since your question is tagged as such), signed was marked as the default because that's how it was with the C implementations that came before the standard.

The original ANSI/ISO standard mandates were to codify existing practice rather than create a new language. Hence the behaviour of pre-standard implementations was the most important factor, as per the rationale document:

The original X3J11 charter clearly mandated codifying common existing practice, and the C89 Committee held fast to precedent wherever that was clear and unambiguous.

The vast majority of the language defined by C89 was precisely the same as defined in Appendix A of the first edition of The C Programming Language by Brian Kernighan and Dennis Ritchie, and as was implemented in almost all C translators of the time. (This document is hereinafter referred to as K&R.)

If you're looking to find out why the pre-standard implementations preferred signed, you'll probably have to look into the architecture of the PDP-n machines for which UNIX and C were originally developed.

The History of C page shows that unsigned was actually a relative latecomer to the language, appearing sometime in the mid '70s:

During 1973-1980, the language grew a bit: the type structure gained unsigned, long, union, and enumeration types, and structures became nearly first-class objects (lacking only a notation for literals).

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 04 '13 at 01:28

paxdiablo

854,327
234
1,573
1,953

Hah. That's interesting. I'm leaning towards this answer, but notice that it still begs the question somewhat: why would previous C implementations have `signed` as the default? – Dervin Thunk Sep 04 '13 at 01:31
2

@Dervin, Why not? `signed` type is suitable to both positive and negative values, that's used in daily life. – Eric Z Sep 04 '13 at 01:42
@Eric, Right, it was just that I thought the modifiers came about at the same time (which I was wrong to assume), so were I in the language design stage, I would have made char unsigned, and ask the programmer to tell me explicitly s/he wanted otherwise. History is soo interesing! – Dervin Thunk Sep 04 '13 at 01:48
According to Wikipedia, C was developed for the PDP-7, but I'm sure K&R did work on earlier computers, and were heavily influenced by the BCPL language. – Dervin Thunk Sep 04 '13 at 01:50
1

@Dervin, yes, as one of the few code monkeys still alive who worked in and implemented BCPL compilers :-), the similarities far outweigh the differences (at least for early C). But it was a beautiful language for its time and I place MartinR right up there with dmr. And it was originally PDP7, yes, though they soon had to port it to other machines, including IBM's big iron. – paxdiablo Sep 04 '13 at 02:26
@paxdiablo: Respect! :) I'm quite nostalgic from that era, even though I was nowhere near born, but I've idealized it as one of innovation and no limits. Jealous you lived it from the inside. You should tell your story sometime in a blog or something. – Dervin Thunk Sep 04 '13 at 12:42

score 5 · Answer 2 · answered Sep 04 '13 at 02:29

It's largely about backward compatibility, and C's descent from earlier languages that could not easily support both signed and unsigned integers.

C was derived from an older language called B, which was derived from an even older language called BCPL (which was a simplified version of CPL).

BCPL was a largely untyped language. A variable declaration did not specify the type of an object; rather, an operation on a given variable would treat it as if it were of a given type.

The BCPL operators +, -, *, /, and REM treated their operands as signed integers, and yielded integer results.

If BCPL had supported unsigned integers, then either it would have had to have another set of operators unsigned operands, or it would not have been able to represent negative numbers at all. (Note that BCPL did not support floating-point.)

B's syntax was quite different from BCPL's (and closer to C's), but it retained much of the same semantics. In particular, variables and functions were of integer type by default -- and there was no unsigned keyword.

Early C, based on B, also did not have an unsigned keyword. It had only four fundamental numeric types: char, int, float, and double. (unsigned was added, along with long, union, and enum, some time between 1973 and 1980.) Given the weakly-typed nature of the language, programmers sometimes used pointers when they needed unsigned arithmetic.

The "feature" that an entity with no declared type is implicitly of type int was retained in C until the 1999 ISO standard finally removed the "implicit int" rule.

Furthermore, signed integer types just tend to be more useful than unsigned types. The ability to represent negative values can be extremely convenient. Given the typical wraparound semantics, an error in an unsigned subtraction of two small values can yield a huge positive value (3 - 4 == 65535 for example, for a 16-bit unsigned type). Even in the systems programming domain that's the main target of all these languages, it's sometimes necessary to represent negative values (for example a change in some quantity).

References:

score 3 · Answer 3 · answered Sep 04 '13 at 01:41

According to The Development of the C Language, the notion of unsigned was an extension to the language when features were being added to it between 1973 and 1980. Although not explicitly stated, the narrative suggests it wasn't introduced until 1977 (see Portability, paragraph 3).

So, defaulting to signed was due to the fact that the language initially only had signed types.

wallyk · Answer 4 · 2013-09-04T03:19:30.477

0

The default signedness of char isn't defined by the language. It is defined by the implementation. Some CPUs are more naturally signed char, and others are more naturally unsigned.

edited Sep 04 '13 at 03:19

answered Sep 04 '13 at 01:28

wallyk

56,922
16
83
148

what do you mean "naturally" signed? – Dervin Thunk Sep 04 '13 at 01:29
@DervinThunk: The naturalness of the instructions to expand an 8-bit quantity to a 16-bit or larger have particular assumptions built in, especially CPUs before about 1985 or so. To promote a char to an int in the *unnatural* direction requires extra instructions to make it so. The *natural* direction requires only a single instruction. – wallyk Sep 04 '13 at 01:32
@wallyk, Are you sure the standard doesn't mandate the signedness of plain `int`? I remember the standard does mandate the smallest range of values that can be reprensented by `int`. See http://stackoverflow.com/questions/6155784/range-of-values-in-c-int-and-long-32-64-bits – Eric Z Sep 04 '13 at 01:35
2

That applies only to plain `char`. Plain `int` is *always* signed (except for bit fields, where plain `int` has implementation-defined signedness). – Keith Thompson Sep 04 '13 at 01:50
@EricZ: I thought the original question was about the default signedness of `char`. I have made that explicit in my answer. – wallyk Sep 04 '13 at 03:20
Unless the OP edited it within the 5-minute window, the question doesn't mention `char` -- and the OP accepted an answer that doesn't mention `char`. – Keith Thompson Sep 04 '13 at 04:06

score 0 · Answer 5 · answered Sep 04 '13 at 04:21

unsigned semantics are guaranteed to be simpler: modulo base-2ⁿ with no exceptions. But don't make an assumption about what n is: the size of the range is not required to be equal to that of the corresponding signed type.

The only requirement is that all positive signed values can also be represented by the corresponding unsigned type.

One valid implementation of unsigned would be to use two's-complement signed arithmetic, and zero out the sign bit after every operation. This isn't likely to show up in real life, but machines with non-two's-complement arithmetic can have more trouble trying to bypass negative number logic.

In practice, negative numbers are an essential feature of any hardware platform, but the ability to treat an entire register as a positive number is just icing on the cake. C is designed to wrap most tightly around the most efficient parts of the hardware.

signed as default in C

5 Answers5