What's the exact role of "significant characters" in C (variables)?

Question

What is the exact role of "significant characters" in C , especially in the field of variables? I have read the topic “(K&R) At Least the first 31 characters...”, but I really don't understand the exact rules of significant characters. The only thing I understand well is that this subject is extinct, but I still need to know!

What don't you understand exactly in the answers to that question? — Mat, Aug 17 '13 at 15:07

Jonathan Leffler · Accepted Answer · 2013-08-17T15:53:36.370

In the current C standard, ISO/IEC 9899:2011, section §5.2.4.1 Translation limits says:

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:¹⁸⁾

...
— 63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character)
— 31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)¹⁹⁾
...

¹⁸⁾ Implementations should avoid imposing fixed translation limits whenever possible.
¹⁹⁾ See ‘‘future language directions’’ (6.11.3).

§6.11.3 External names
¶1 Restriction of the significance of an external name to fewer than 255 characters (considering each universal character name or extended source character as a single character) is an obsolescent feature that is a concession to existing implementations.

This means that when dealing with names, internal names that are distinct within the first 63 characters must be treated as distinct by the compiler, but if you are misguided enough to create two (or more) identifiers that differ in the 64^th character only (the first 63 are identical, but the 64^th character in one is, say, 1 and in the other is z), then the compiler may legitimately, and without warning, treat those two identifiers as the same.

The limit on external names — names which affect the linker rather than the compiler proper — may be limited to as few as 31 characters. Consider:

extern int abcdefghijkjlmnopqrstuvwxyz123456;
extern int abcdefghijkjlmnopqrstuvwxyz123457;

These two declarations may be treated as referring to the same variable if the system (linker) limits you to 31 characters.

As the future directions section states, any limit shorter than 255 is 'obsolescent', meaning that you should not be limited by this before names are 255 characters long. But the standard does not mandate 255 characters as the limit yet.

History

Previous editions of the standard had smaller lower bounds on the upper limits of the lengths of names. The C89 standard only mandated 6 characters monocase for external names (but it was regarded as a painful concession to existing linkers), so strcmp and StrCmp could be the same, as could abcdefg and abcdefz. Part of the trouble may have been Fortran; it only required support for 6 character monocase names, so linkers on systems where Fortran was widely used did not need to support longer names.

The limits in C99 were the same as in C11.

So can we infer: since there is less limits in today's IDEs, their linkers may not follow exact C99 standard rules/foundation ? — kaymas, Aug 17 '13 at 16:17
@kaymas: re 'IDEs and C99' — most modern linkers provide limits at or beyond the lower bounds required by C99; ideally, they should not impose any limits on the lengths of names. This means that they meet the C99 standard (where the footnote 18 I quoted says that they should avoid imposing limits on the length of names when possible). A linker might contravene the C99 standard by not allowing names of at least 31 characters, but I've not heard of such a linker. An IDE should be aware of the standard and of any limitations of the compiler and linker it uses. — Jonathan Leffler, Aug 17 '13 at 17:41
I would add that `N2346/6.4.2.1p6` _If two identifiers differ only in nonsignificant characters, the behavior is undefined._ So as you noted _the compiler may legitimately, and without warning, treat those two identifiers as the same_, but it's UB anyway. — Some Name, Aug 27 '21 at 06:25

score 2 · Answer 2 · answered Aug 17 '13 at 15:08

2

In the old days of C, when the compilers and programs were run on machines with very limited memory (think kilobytes, not gigabytes) then to save memory the compilers only used up to eight (on the early compilers) characters of identifiers (names of variables, functions etc.). This is the role of "significant characters", it's the number of characters the compiler uses for names in the source.

answered Aug 17 '13 at 15:08

Some programmer dude

400,186
35
402
621

Ok, Is it true that declaring two very similar variables that their difference just limited to "cases" was a big risk at early days of c? – kaymas Aug 17 '13 at 15:18
2

@kaymas Oh yes, it most definitely was. The two names `foo_bar_123` and `foo_bar_135` would be considered the same from the compilers point of view. – Some programmer dude Aug 17 '13 at 15:20
1

I think the restriction was due to limitations in the linker, not in the compiler. – lhf Aug 17 '13 at 15:20
1

And some early (Fortran-compatible, mainframe-based?) linkers had a limit of **6** characters monocase on external names. The C89 standard was only able to mandate 6 characters monocase (so `A12345` and `a12345` were the same) for external names. By C99, they'd increased the requirement to 31 characters. Most systems support far more because C++ name mangling requires long names, in general. – Jonathan Leffler Aug 17 '13 at 15:23
@JoachimPileborg | sorry for my bad english. my question is about case-sensitivity aspect of C. – kaymas Aug 17 '13 at 15:25
1

@kaymas Ah okay. Case sensitivity have always been a part of C, but some early linkers did not care about cases of names so e.g. a function named `FOO` and one `foo` would cause redefinition errors. – Some programmer dude Aug 17 '13 at 15:28
@JonathanLeffler | thank you. but what would happen if there was more than one letter in the identifier. for example, were Ab1234 & AB1234 the same? what about today's linkers? – kaymas Aug 17 '13 at 15:30
@JoachimPileborg | Thank you. that was exactly what I needed to know. – kaymas Aug 17 '13 at 15:35
With the early linkers, `Ab1234` and `AB1234` could be considered the same — that's the monocase bit. And `ab12345` and `ab1234z` could be considered the same — that's the 6 significant character limit. These limits, mercifully, never affected me (we had at least 8 significant characters, case-sensitive, even back in 1983), but the standard had to accommodate mainframe implementations of C. – Jonathan Leffler Aug 17 '13 at 15:49

Slugart · Answer 3 · 2013-08-17T17:26:27.643

0

There is no role - the number of significant characters is a limitation imposed by C linkers. The 31 char limit was used by early linkers.

edited Aug 17 '13 at 17:26

answered Aug 17 '13 at 15:10

Slugart

4,535
24
32

2

Even in ISO/IEC 9899:2011, there are limits on the number of significant characters that an implementation must support. – Jonathan Leffler Aug 17 '13 at 15:26

score 0 · Answer 4 · answered Aug 17 '13 at 15:17

0

This is what I think,

It means that all the characters, after first 31 characters will be ignored, i.e. variable names:

ab..(27 characters)..yz123
ab..(27 characters)..yz578

will be treated as:

ab..(27 characters)..yz

and so you can get redeclaration error...

answered Aug 17 '13 at 15:17

0xF1

6,046
2
27
50

ok, but what about cases? imagine these two variables both have 31 characters: ab..(27char)..yz & ab..(27char)..yZ ; is they still A variable ? – kaymas Aug 17 '13 at 15:21
@kaymas: Some languages are case-sensitive, and some aren't, and it's possible to link together components written in a mixture of case-sensitive and non-case-sensitive languages. Some systems require that non-case-sensitive languages must export all identifiers in uppercase only (so if function Foo is written in Pascal, C code would need to invoke it as FOO) but others use case-insensitive linking. – supercat Jan 28 '16 at 17:26
Saying 'will be ignored' is putting it too strongly. Using 'might be ignored' would be more accurate. – Jonathan Leffler Mar 04 '17 at 17:12

What's the exact role of "significant characters" in C (variables)?

4 Answers4

History

Linked