C - Why did ANSI only specify six characters for the minimum number of significant characters in an external identifier?

Question

I have a question regarding section 5.2.4.1 Translation Limits in the first American National Standard for Programming languages - C, also known as ANSI/ISO 9899-1990, ISO/IEC 9899.1990 (E), C89, etc. Simply put, the first ANSI C standard.

What does the standard say that is so strange?

It infamously states that a conforming C compiler is only required to handle, and I quote:

5.2.4.1 Translation Limits

6 significant initial characters in an external identifier

Now, it is painfully obvious that this is unreasonably short, especially considering that C does not have anything similar to a name space. It is especially important to allow for descriptive names when dealing with external identifiers, seeing how they will "pollute" everything you link.

Even the standard library mandates functions with a longer name, longjmp, tmpfile, strncat. The latter, strncat, showing that they had to work a bit to invent library names where the initial six characters were unique, instead of the arguably more logical strcatn which would have collided with strcat.

Why is it still a problem to me?

I enjoy oldish computers. I'm trying to write programs that will compile and work well on platforms pre-C99, which sometimes does not exist on my beloved targets. Perhaps I also enjoy trying to really follow the standard. I have learned a lot about C99 and C11 by just digging through the older standards, trying to trace reasons for certain limitations and implementation issues.

So, even though I know of no compiler or linker actually enforcing or imposing this limitation, it still nags me that I can not claim to have written strictly conforming code if I also want to use legible and non-colliding external identifiers.

Why would they impose such a thing?

They began work on the standardization some time during the early eighties, and finalized it in 1988 or 1989. Even in the seventies and sixties, it would not have been any problem whatsoever to handle longer identifiers.

Considering that any compiler wanting to conform to the new standard must be modified - if only to update the documentation - I don't see how it would be unreasonable for ANSI to set down the foot and say something similar to "It is 1989 already. You must handle 31 significant initial characters". It would not have been a problem for any platform, even ancient ones.

Backwards compatibility?

From what I've read when searching for this, the problem might come from FORTRAN. In an answer to the question What's the exact role of "significant characters" in C (variables)?, Jonathan Leffler writes:

Part of the trouble may have been Fortran; it only required support for 6 character monocase names, so linkers on systems where Fortran was widely used did not need to support longer names.

To me, this seems like the most reasonable answer to the direct question Why?. But considering that this restriction bugs me every time I want to write a program that could theoretically be built on old systems, I would like to know some more details.

Questions

After having searched a bit about the FORTRAN track, I've only came up with theories and hand-waving. Which popular platforms did actually impose a limit of only 6 characters? Is there a linker which was extra popular, that forced the standards committee to budge?
I'm not old enough to have been interested in these kind of details when they were discussed. Has this limit and its rationale been publicly discussed and defended? Was there a public outcry, or just silently ignored? Pitchforks outside the ANSI headquarters?

Ultimately, the answers to these questions will make it easier for me to decide how bad I should sleep at night for giving reasonable names to my functions.

Great question. I got the same one, thank you for posting it here! — behkod, May 19 '19 at 11:15

score 14 · Answer 1 · edited May 08 '20 at 19:36

14

30 years ago - I was there - the vast majority of the world's code was written in Cobol, Fortran and PL/1 and the vast majority of that ran on IBM 370-series mainframe computers, or compatibles. Most of the C code in the world ran on DEC's PDP-11 and VAX mini-computers. Unix and C were born on the PDP and DEC hardware was their stronghold.

This was the world from which the ANSI C committee came and in which they considered the practicalities of linking code written in C with the languages that really mattered, on the systems that really mattered.

Fortran compilers were Fortran 77 compilers and restricted identifiers to 6 characters. PL/1 compilers, back then, restricted external identifiers to 7 characters. The S/370 system linker truncated symbols to 8 characters. Not at all co-incidentally, the PDP-11 assembly language required symbols to be unique within the first 6 characters.

There weren't any pitchforks on the lawn of the ANSI C committee when it stipulated 6 initial significant characters for external identifiers. That meant a conforming compiler could be implemented on IBM mainframes; and it need not be one to which the PDP-11 assembler would be inadequate and need not be able to emit code that couldn't even be linked with Fortan 77. It was a wholly unsensational choice. The ANSI C committee could no more have "put its foot down" for changing the IBM mainframe linker than it could have laid down the law about Soviet missile design.

It is 1989 already. You must handle 31 significant initial characters". It would not have been a problem for any platform, even ancient ones.

You're wrong about that. Run Moore's Law backwards mentally for 30 years and try to image how puny computers were while that Committee was at work. A mainframe computer that supported hundreds of users as well as running all the data-processing systems of a large corporation typically did it with less than the processing power, the memory and storage resources I've got in my old Google Nexus tablet today.

An IBM 3380E hard disc unit, 1985, had a capacity of 5.0GB and cost around $120K; $270K in today's money. It had a transfer rate of 24Mbps, about 2% of what my laptop's HD delivers. With parameters like that, every byte that the system had to store, read or write, every disc rotation, every clock cycle, weighed on the bottom line. And this had always been the case, only more so. A miser-like economy of storage, at byte granularity, was ingrained in programming practice and those short public symbol names was just one ingrained expression of it.

The problem was not, of course, that the puny, fabulously expensive mainframes and minis that dominated the culture and the counsels of the 1980s could not have supported languages, compilers, linkers and programming practices in which this miserly economy of storage (and everything else) was tossed away. Of course they could, if everybody had one, like a laptop or a mobile phone. What they couldn't do, without it, was support the huge multi-user workloads that they were bought to run. The software needed to be excruciatingly lean to do so much with so little.

edited May 08 '20 at 19:36

peterh

11,875
18
85
108

answered Jun 26 '16 at 20:08

Mike Kinghan

55,740
12
153
182

1

Thank you for this amazing answer! I _did_ fail to take the multi-user mainframes into account, although I'm still not convinced that it is a good reason. Whatever code is being compiled, the storage for external declarations will drown in everything else you have to store - both in memory and on disk. I can however understand that it may have been _seen_ as a reason at the time. – pipe Jun 27 '16 at 01:04
The Limit on Link was often built in to Operating Systems which all dated from much earlier dates (e.g. 1964 for the mainframe). The Ibm Mainframe stores load modules as member in a PDS and member names have a limit of 8 characters. This limit dates from 1964. – Bruce Martin Jun 27 '16 at 07:33
1

MsDos / Windows used 8 character names (+ 3 char type). For a long time longer file names where implemented as a Fudge (with the 8 character name being unique). From memory Windows Xp (2002 ??) was the first version to fully support long file names. Prior to that all system programs had to be 8 characters or less – Bruce Martin Jun 27 '16 at 07:40
1

Would there have been any difficulty specifying a syntax for defining functions whose internal and external names differed (e.g. so one could declare e.g. `double integrate (double p[]) extern "int" ` to indicate that the function should be defined within C using the name "integrate", but the imported or exported symbol should be called "int" [a name which would otherwise be impossible to use in C]? Some particular implementations have such a feature, but would there have been any difficulty with having a standard syntax for it? – supercat Jun 29 '16 at 23:19
1

@supercat That neat idea would have been perfectly feasible as far as I can see but I never met with any ancient compiler that supported it myself. People weren't much vexed by the fact that external names had to be short enough for the even more ancient linker. It was only *after* the ascendancy of C, with its revolutionary "standard library", that externalized APIs really took off in software development and people *wanted* to link loads of expressive symbols. – Mike Kinghan Jun 30 '16 at 17:46
@MikeKinghan: I've seen compilers with #pragma directives for that purpose (setting the external name of the next identifier declared) and I think MSVC has a non-pragma directive for that purpose. I can't think of any platform for which a compiler couldn't provide such functionality if use of a non-platform-supported name would make a program "ill-formed no diagnostic required". – supercat Jun 30 '16 at 18:29
1

"byte granularity" should perhaps read "bit granularity" -- I have dealt with message protocols that encode sub-8-bit values across byte boundaries! – M.M Jan 31 '17 at 13:04
1

What prevented the compiler from taking the source, giving everything short unique names, and then compiling that? – Aaron Franke Jul 30 '19 at 04:30
2

Funny thing is, that old 1989 computer actually *could* serve that 100 users, while 1 is often too much for a current device today. – peterh May 08 '20 at 19:36
@AaronFranke [Pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle)? – Aykhan Hagverdili Nov 01 '22 at 06:18
@AyxanHaqverdili: The fact that people might want to invoke functions that were written in C, from within code written in other languages. If C build systems substituted their own arbitrary names, adjusting names as needed to avoid duplication, there would be no reliable way to know how to call a C function from within other languages. – supercat Jan 11 '23 at 22:05
@supercat Are you trying to reply someone else? I don't see how that's a reply to my comment. – Aykhan Hagverdili Jan 11 '23 at 22:09
@AaronFranke: See my previous comment to AyxanHaqverdili. – supercat Jan 11 '23 at 22:10

score 0 · Answer 2 · answered Dec 12 '22 at 00:01

At BTL I originally wrote the "n" functions, strcpyn, etc ~1977 ... forgetting that at that time, C on IBM S/370 with IBM OS's (MVS, VM/CMS) limited externs to 7 characters, so others changed this to strncpy, etc, as we amidst serious efforts to try to make C portable across many legacy systems, including IBM and Univac 11xx. By 1980, limit had been raised to 8, but the strn's were already in use.