7

I know that there are rules for using underscores in identifiers in C/C++. Are there any rules for using them in source code filenames?

For instance, are there any restrictions against beginning or ending a filename with an underscore? Or having an underscore as the last character before the .c or .h extension? Double underscores?

References are appreciated, if there are any.

Machavity
  • 30,841
  • 27
  • 92
  • 100
cp.engr
  • 2,291
  • 4
  • 28
  • 42

3 Answers3

9

If the source files are subject for preprocessor #include directives, then the C and C++ standards specify a minimum set of requirements on the filename. The C standard says:

6.10.2 Source file inclusion

...

  1. The implementation shall provide unique mappings for sequences consisting of one or more nondigits or digits (6.4.2.1) followed by a period (.) and a single nondigit. The first character shall not be a digit. The implementation may ignore distinctions of alphabetical case and restrict the mapping to eight significant characters before the period.

... where nondigit contains the letters A-Z, a-z and underscore.

The exact same text (except for the paragraph numbers) can also be found in the C++ standard, 16.2 Source file inclusion.

Beyond that, what passes for a valid filename depends on the operating system, file system, compiler, linker and other parts of the compilation tool chain.

These days, I'd expect most modern systems to allow almost anything that isn't directly forbidden by the file system.

References

  • The final public draft of the C11 standard, n1570
  • The final public draft of the C++11 standard, n3337
Community
  • 1
  • 1
Nisse Engström
  • 4,738
  • 23
  • 27
  • 42
  • I was really shocked by this. However, while the standard disallows some names, at least GCC, allows anything that is allowed by the OS, even with `-stc=c11 -pedantic-errors`, and Clang will probably do the same, so not much to worry about. `".//./1_2-3.4~5.hh"` compiled fine :) – alx - recommends codidact Feb 24 '19 at 11:01
  • 1
    @alx-recommendscodidact The standard doesn't disallow any file names. It allows (but doesn't require) an implementation to reject some header names, but header names needn't be file names. The implementation provides a *mapping* of `#include "..."` header names to source file names. That mapping is not specified (though it's typically one-to-one). – Keith Thompson Aug 13 '23 at 06:07
6

No. Files can be named whatever they want (given that the underlying file-system supports the name) - neither the C++, nor the C standard have any stake in that. There are rules about "_" in identifiers, yes, but that does not carry over to external things like file names.

Jesper Juhl
  • 30,449
  • 3
  • 47
  • 70
  • 3
    Files can be named whatever you want *if the underlying system supports the name*. – Keith Thompson Jan 16 '17 at 21:31
  • No, but it's worth mentioning in your answer. – Keith Thompson Jan 16 '17 at 21:51
  • `Files can be named whatever they want ...the C++ standard has no stake in that.` Is the same true of C? (I'm guessing yes...) If so, please add that to your answer so I can accept. – cp.engr Jan 17 '17 at 14:33
  • @cp.engr yes, that's also true for C (answer updated). – Jesper Juhl Jan 17 '17 at 15:21
  • 2
    ...Apparently your answer is incorrect, so I changed my accepted answer. – cp.engr Jan 17 '17 at 20:06
  • @cp.engr just because there are rules about how includes can be named, that doesn't mean that files in general can't be named differently. Also, includes don't have to map to actual files. I don't see how any of that invalidates my answer... – Jesper Juhl Jan 17 '17 at 22:34
  • 3
    Despite the lack of restriction on underscores - about which I asked, and which is consistent with your answer - the statement of yours quoted in my above comment is demonstrably false given Nisse Engström's citation. – cp.engr Jan 17 '17 at 22:43
2

Neither the C nor the C++ language (and please remember that they're two different languages) has any rules for source file names. Both specify the names for their standard headers, and follow certain conventions for those names, but those conventions are not imposed on other source files. (Note that standard headers are not necessarily even implemented as files.)

An operating system, or a file system, or a compiler, or some other part of the environment might impose some requirements.

More specifically, Unix-like systems typically permits any characters in file names other than '/' (which is the directory path delimiter) and '\0' (which is the string terminator), and compilers typically permit any valid file name (possibly paying attention the extension to determine which language to compile). Windows disallows some other characters. Case-sensitivity varies from one system to another; foo.c and Foo.c may or may not name the same file. The latter can be significant for foo.c vs. foo.C; sometimes .C is used as an extension for C++ source. Use something else if your code might be used on a case-insensitive file system (Windows, MacOS).

There are a number of conventions for file extensions used to identifier the contents of a source file, such as .c for C source, .h for a header file, .cpp or .cc or .cxx for C++ source, and so on. Consult your compiler's documentation.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • `are you asking about C or C++ source file names` - yes. Question updated. – cp.engr Jan 16 '17 at 21:34
  • 3
    A header file name can't contain any characters that would result in invalid syntax in the `#include`, right? – Mark Ransom Jan 16 '17 at 22:41
  • 1
    Note that filename character encodings may differ by OS, too, in addition to disallowed characters. Last I checked, Linux filenames used UTF-8 Unicode, and Windows ones used UTF-16 Unicode, for example. – Justin Time - Reinstate Monica Jan 16 '17 at 22:47
  • @MarkRansom: Given `#include "foo.h"` the *q-char-sequence* `foo.h` (it's not a string literal) identifies a source file. It isn't necessarily the literal name of the source file. There could be implementation-defined syntax for encoding strange file names in `#include` directives. – Keith Thompson Jan 16 '17 at 22:54
  • 3
    `Neither the C nor the C++ language...has any rules for source file names.` Apparently this is not the case. Refer to the newly-accepted answer. – cp.engr Jan 17 '17 at 20:07
  • @cp.engr Yes, it is the case. Valid header names are *mapped* to file names. A conforming implementation could reject header names containing underscores (none do so as far as I know), but that doesn't directly imply anything about source file names. – Keith Thompson Aug 13 '23 at 06:00