631

I know that / is illegal in Linux, and * " / \ < > : | ? are illegal in Windows.

What else am I missing? I need a comprehensive guide that also accounts for double-byte characters.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Jeff
  • 14,831
  • 15
  • 49
  • 59
  • 31
    Some of the characters your mention are in fact allowed on Windows. Check this: `echo abc > "ab.;,=[1]"` – dolmen Apr 17 '15 at 14:07
  • 11
    Also don't forget < and > are illegal on Windows. – AnotherParker Mar 14 '16 at 13:30
  • 2
    just because win32 API passes it doesn't mean it's allowed. read the NTFS specs and FAT32 specs first before jumping in with RCS and CVS on windows. – Jim Michaels Jan 18 '17 at 03:24
  • 4
    `^` is forbidden on FAT – eckes Jun 10 '17 at 17:57
  • 7
    @DavidC.Bishop: [This SO post](https://stackoverflow.com/questions/9847288/is-it-possible-to-use-in-a-filename#9847573) asserts that the Linux kernel will prevent you from working with a filename containing a slash. Have you been able to make it work? – Soren Bjornstad Sep 03 '18 at 16:20
  • 50
    "/ isn't illegal in Linux. You just have to escape it with a \ when typing it in" -- this statement is completely wrong. filename components cannot contain /, and escaping it has no effect. – Jim Balter Oct 08 '18 at 18:39
  • 5
    I'm testing on NTFS only and can say that . [ ] = : ; and , appear to be fine. I did not test FAT32 – naskew Oct 10 '18 at 07:48
  • 2
    `;` isn't illegal in file or folder names in Windows. I use it all the time as a pseudo-replacement for `:`. Ex: I might name a folder in Windows `std;;string` to document info about [std::string](http://www.cplusplus.com/reference/string/string/). And in place of a double quote on Windows (`"`), I just do two single quotes side-by-side, which looks close enough: `''`. – Gabriel Staples Apr 21 '20 at 00:18
  • A slightly different question is "what is an impossible directoryname or filename in Linux?" Trying to create a file/directory with the empty string as a filename always fails. The empty string can be passed as an argument to program/function as ''. I can't see anything else that can passed as an argument that absolutely can not be a filename. Even '//' and '/..' simply resolve to '/' and so are legal directorynames (when using them in the bash shell). – Craig Hicks Jul 14 '20 at 05:02
  • There is a difference between being disallowed as a path character (`" < > |`) and being forbidden as a file name char (`: * ? \ /` + path chars) – Cadoiz Sep 17 '21 at 07:21
  • I know that the rules are intricate, but just for "sane" file names, I found the following covered all my cases for a file *path*: `[A-Z_a-z0-9 %.,+/-]`. The slash is separator between folders/directories in a path. These are for files on Windows and Cygwin. – user2153235 Jan 11 '23 at 03:02

20 Answers20

1000
  1. The forbidden printable ASCII characters are:

    • Linux/Unix:

        / (forward slash)
      
    • Windows:

        < (less than)
        > (greater than)
        : (colon - sometimes works, but is actually NTFS Alternate Data Streams)
        " (double quote)
        / (forward slash)
        \ (backslash)
        | (vertical bar or pipe)
        ? (question mark)
        * (asterisk)
      
  2. Non-printable characters

    If your data comes from a source that would permit non-printable characters then there is more to check for.

    • Linux/Unix:

        0 (NULL byte)
      
    • Windows:

        0-31 (ASCII control characters)
      

    Note: While it is legal under Linux/Unix file systems to create files with control characters in the filename, it might be a nightmare for the users to deal with such files.

  3. Reserved file names

    The following filenames are reserved:

    • Windows:

        CON, PRN, AUX, NUL 
        COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
        LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9
      

      (both on their own and with arbitrary file extensions, e.g. LPT1.txt).

  4. Other rules

    • Windows:

      Filenames cannot end in a space or dot.

    • macOS:

      You didn't ask for it, but just in case: Colon : and forward slash / depending on context are not permitted (e.g. Finder supports slashes, terminal supports colons). (More details)

iono
  • 2,575
  • 1
  • 28
  • 36
Christopher Oezbek
  • 23,994
  • 6
  • 61
  • 85
  • 2
    "CONIN$" and "CONOUT$" are also reserved. Unlike "CON" they allow accessing the console input and screen buffer with read-write access. Prior to Windows 8, only the base filenames are reserved. Starting with Windows 8 the underlying console IPC was redesigned to use a device driver, so these two names are now handled generically as DOS devices, the same as "NUL", etc. This means they can be used in local-device paths such as "\\.\CONIN$" and "\\?\CONOUT$" and also that the API pretends the names 'exist' in every existing directory. For example, "C:\Temp\CONOUT$" references console output. – Eryk Sun May 18 '18 at 14:15
  • 4
    Note that the reserved DOS device names and the rule about filenames ending in a dot or spaces are applied by the runtime library when converting a DOS path to a native NT path. If a path starts with the "\\?\" local-device prefix, this normalization step gets skipped, except to replace "\\?\" with NT's "\??\" device prefix. This prefix instructs the object manager to search the logon-session and global DOS device directories for a symbolic link to a native NT device, which is usually a device object in the "\Device" directory. – Eryk Sun May 18 '18 at 14:26
  • 4
    OTOH, the reserved characters are not simply a function of the DOS namespace. They're reserved at a low level in the kernel and file system. The "\" character is NT's path separator and is reserved by the object manager. Everything else is allowed in object names, which includes DOS device names such as "C:". The other reserved characters, including ASCII control characters, are due to the kernel's file-system runtime library, which is used by Microsoft's file systems. These characters are reserved in primary filenames, not in stream names. – Eryk Sun May 18 '18 at 14:36
  • 5
    The `*?<>"` characters are reserved as [wildcard characters](https://blogs.msdn.microsoft.com/jeremykuhne/2017/06/04/wildcards-in-windows). This is due to a peculiar design decision to have file systems implement filtering a directory listing at a low level in their implementation of the `NtQueryDirectoryFile` system call. In POSIX systems this is implemented at the application level. – Eryk Sun May 18 '18 at 14:42
  • 1
    You can name a file with a forward slash on most Linux distros just fine. There may be problems retrieving it though. It's not forbidden, it's just stupid. You can create a file outside the shell (which would automatically parse the `/` as a path separator), e.g. with a C program or Python script. – Mad Physicist Jun 21 '18 at 15:45
  • 7
    "You can name a file with a forward slash on most Linux distros just fine." -- No, you can't. '/' is always treated as a directory separator, by the kernel, not just the shell. There's no way to get around this with a C program or Python script or any other way. – Jim Balter Oct 08 '18 at 18:55
  • 4
    Fun fact: Using Cygwin you can readily create `lpt1` and `lpt1.txt`. **Then try deleting them** in Windows Explorer: You can't. Or in `cmd.exe`: You can't. Cygwin can, though.It appears to be a 1980s restriction that is help up artificially. – Lutz Prechelt Mar 18 '19 at 14:43
  • 2
    Supplementary fun fact: you can programmatically create files with "*" and "?" in the name on windows. So technically, not illegal; just a very very bad idea. (The solution to deleting a file named "lpt1", by the way, is "ren lpt? lptx". Deleting a file named *.\* might be more challenging). – Robin Davies May 10 '20 at 23:14
  • @RobinDavies, you cannot create filenames with "*" and "?" in their names on a proper Windows filesystem, unless you're hacking the underlying filesystem data structures. These are reserved wildcard characters by every filesystem except for the named-pipe filesystem. Any filesystem driver that doesn't reserve them -- as well as the other wildcards `<`, `>`, and `"` -- is fundamentally broken. It will not function properly with `FindFirstFileW`, which depends on the filesystem to support DOS wildcard matching (requires all 5 wildcards) in the `NtQueryDirectoryFile` system call. – Eryk Sun Jul 13 '20 at 17:56
  • 1
    It is not a nightmare to deal with non-printable characters in filenames from shell scripts, although it is harder as we would like to. – peterh Sep 18 '20 at 18:31
  • 3
    On MacOS, the only forbidden printable ASCII character is `:`. Using the Windows superset of forbidden characters is sensible because it covers Linux and MacOS too. – AlainD Feb 10 '21 at 12:27
  • I just confirmed @AlainD's comment. The only character I wasn't allowed to name my file is the colon character. However the reason for investigating was because I received a file from a windows user with a colon in it's name. – Dark Star1 Mar 03 '21 at 11:41
  • There is a difference between being disallowed as a path character (`" < > |`) and being forbidden as a file name char (`: * ? \ /` + path chars) – Cadoiz Sep 17 '21 at 07:22
  • 2
    Filenames are technically able to end in a space in Windows, but file explorer is not able to properly interact with it. It can only be interacted with using UNC pathes. To see for yourself, you can do `echo; > "\\?\%CD%\test "` in command pompt. You'll notice you can't delete or open it in explorer. Use `del "\\?\%CD%\test "` to get rid of it or `ren "\\?\%CD%\test " "test"` to rename it. Not really useful information, but it's a handy thing to know about if you ever run into a file with a trailing space. – Vopel Dec 13 '21 at 18:01
  • Upvoted. Extending your answer: [how to find and fix illegal Windows chars on Linux](https://stackoverflow.com/a/76794738/4561887). – Gabriel Staples Jul 29 '23 at 17:28
267

A “comprehensive guide” of forbidden filename characters is not going to work on Windows because it reserves filenames as well as characters. Yes, characters like * " ? and others are forbidden, but there are a infinite number of names composed only of valid characters that are forbidden. For example, spaces and dots are valid filename characters, but names composed only of those characters are forbidden.

Windows does not distinguish between upper-case and lower-case characters, so you cannot create a folder named A if one named a already exists. Worse, seemingly-allowed names like PRN and CON, and many others, are reserved and not allowed. Windows also has several length restrictions; a filename valid in one folder may become invalid if moved to another folder. The rules for naming files and folders are on the Microsoft docs.

You cannot, in general, use user-generated text to create Windows directory names. If you want to allow users to name anything they want, you have to create safe names like A, AB, A2 et al., store user-generated names and their path equivalents in an application data file, and perform path mapping in your application.

If you absolutely must allow user-generated folder names, the only way to tell if they are invalid is to catch exceptions and assume the name is invalid. Even that is fraught with peril, as the exceptions thrown for denied access, offline drives, and out of drive space overlap with those that can be thrown for invalid names. You are opening up one huge can of hurt.

Legorooj
  • 2,646
  • 2
  • 15
  • 35
Dour High Arch
  • 21,513
  • 29
  • 75
  • 90
  • 12
    The key phrase from the MSDN link is "[and a]ny other character that the target file system does not allow". There may be different filesystems on Windows. Some might allow Unicode, others might not. In general, the only safe way to validate a name is to try it on the target device. – Adrian McCarthy Dec 29 '09 at 19:02
  • 127
    There are some guidelines, and *“there are a infinite number of names composed only of valid characters that are forbidden”* isn't constructive. Likewise *“Windows does not distinguish between upper-case and lower-case characters”* is a foolish exception — the OP is asking about syntax and not semantics, and no right-minded people would say that a file name like `A.txt` was *invalid* because `a.TXT` may exist. – Borodin Jan 27 '16 at 22:41
  • 2
    The idea that you shouldn't permit user access to file structure addresses is sound but very poorly phrased. Users should be able to examine and manipulate the entities that the application exposes to them. While those entities may be dynamically-named abstracts of multiple databases, there is nothing wrong with asking the user for the name of a file. The securities on an application should prevent users from making mistakes and from exceeding their authority; they should not prevent them from doing what they need to do – Borodin Jan 27 '16 at 22:49
  • 1
    I regularly use Perl, and my habit is to use strings quoted as `q< ... >` because neither `<` nor `>` are valid within a Windows file path. I suspect that the restrictions are archaic and intended to avoid characters that are significant in a DOS environment, or at least within a Windows command shell – Borodin Jan 28 '16 at 02:15
  • 11
    `COPY CON PRN` means read from keyboard input, or possible stdin, and copy it to the printer device. Not sure it is still valid on modern windows, but certainly was for a long time. In the old days you could use it to type text and have a dot-matrix printer simply output it. – AntonPiatek Apr 11 '16 at 10:35
  • 7
    "You cannot, in general, use user-generated text to create Windows directory names." <-- If you want to do this you can just have a character whitelist and it'll largely work, if you can ignore the already-exists issue. – Casey Oct 16 '17 at 17:59
  • 6
    That observation "You cannot, in general, use user-generated text to create Windows directory names" is a bit ludicrous to be honest. There are plenty of cases where you want to allow users to name their files and folders, so just saying "don't do it" is not helpful. – laurent Nov 19 '18 at 23:43
  • 14
    @JimBalter Unless I've misunderstood, it's not constructive because "infinite number of names composed only of valid characters that are forbidden" is rather meaningless if the rules for filenames are well-defined and themselves not infinite. Nothing in this answer justified describing the possibilities as infinite in a way that is helpful or useful to the reader. E.g. contrast the following: (1) In Linux, "/" is not allowed. (2) No comprehensive guide for Linux is possible because there are an infinite number of disallowed names e.g. "/", "//", "///", "a/a", "b/b", etc. – JBentley Apr 18 '20 at 17:54
  • 3
    Please note that the assumption that Windows file names are case-insensitive is incorrect. To make matters worse, case sensitivity rules on Windows [can now be set per-directory](https://learn.microsoft.com/en-us/windows/wsl/case-sensitivity). – dialer Sep 12 '21 at 15:19
91

Under Linux and other Unix-related systems, there were traditionally only two characters that could not appear in the name of a file or directory, and those are NUL '\0' and slash '/'. The slash, of course, can appear in a pathname, separating directory components.

Rumour1 has it that Steven Bourne (of 'shell' fame) had a directory containing 254 files, one for every single letter (character code) that can appear in a file name (excluding /, '\0'; the name . was the current directory, of course). It was used to test the Bourne shell and routinely wrought havoc on unwary programs such as backup programs.

Other people have covered the rules for Windows filenames, with links to Microsoft and Wikipedia on the topic.

Note that MacOS X has a case-insensitive file system. Current versions of it appear to allow colon : in file names, though historically, that was not necessarily always the case:

$ echo a:b > a:b
$ ls -l a:b
-rw-r--r--  1 jonathanleffler  staff  4 Nov 12 07:38 a:b
$

However, at least with macOS Big Sur 11.7, the file system does not allow file names that are not valid UTF-8 strings. That means the file name cannot consist of the bytes that are always invalid in UTF-8 (0xC0, 0xC1, 0xF5-0xFF), and you can't use the continuation bytes 0x80..0xBF as the only byte in a file name. The error given is 92 Illegal byte sequence.

POSIX defines a Portable Filename Character Set consisting of:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 . _ -

Sticking with names formed solely from those characters avoids most of the problems, though Windows still adds some complications.


1 It was Kernighan & Pike in ['The Practice of Programming'](https://www.cs.princeton.edu/~bwk/tpop.webpage/) who said as much in Chapter 6, Testing, §6.5 Stress Tests:

When Steve Bourne was writing his Unix shell (which came to be known as the Bourne shell), he made a directory of 254 files with one-character names, one for each byte value except '\0' and slash, the two characters that cannot appear in Unix file names. He used that directory for all manner of tests of pattern-matching and tokenization. (The test directory was, of course, created by a program.) For years afterwards, that directory was the bane of file-tree-walking programs; it tested them to destruction.

Note that the directory must have contained entries . and .., so it was arguably 253 files (and 2 directories), or 255 name entries, rather than 254 files. This doesn't affect the effectiveness of the anecdote, or the careful testing it describes.

TPOP was previously at http://plan9.bell-labs.com/cm/cs/tpop and http://cm.bell-labs.com/cm/cs/tpop but both are now (2021-11-12) broken. See also Wikipedia on TPOP.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    254 files? And what about utf8? – j_kubik Sep 09 '12 at 01:33
  • 28
    The 254 files were all single-character file names, one per character that was permitted in a filename. UTF-8 wasn't even a gleam in the eye back when Steve Bourne wrote the Bourne shell. UTF-8 imposes rules about the valid sequences of bytes (and disallows bytes 0xC0, 0xC1, 0xF5-0xFF altogether). Otherwise, it isn't much different — at the level of detail I'm discussing. – Jonathan Leffler Sep 09 '12 at 01:37
  • 2
    The on-disk directory separator for MacOS HFS+ filesystems is actually a ':' rather than a '/'. The OS usually (probably always) does the right thing when you are working with *nix APIs. But don't expect this to happen reliably if you are moving to the OSX world, e.g. with applescript. It looks like maybe Cocoa APIs use the / and hide the : from you too, but I am pretty sure the old Carbon APIs don't. – Dan Pritts Dec 09 '13 at 16:07
  • @DanPritts I created a custom font/colour scheme in Xcode's preferences, naming it with a `/` in the name. That caused some issues, as it created a new directory with the scheme in. – Andreas is moving to Codidact Jun 28 '19 at 10:07
  • 1
    Note that if a directory has a colon in its name, you cannot add the directory to a Unix `PATH` variable because colon is used as the separator (semicolon on Windows). So, programs in such a directory must either be run with a pathname that specifies where it is (could be relative or absolute), or you must be in the directory and have dot (`.`, the current directory) in `PATH`, which is widely regarded as a unsafe. – Jonathan Leffler Apr 24 '20 at 15:13
  • It would be interesting to discuss what are the differences between filesystems regarding limitations, as it doesn't only depend on the OS. I'm also thinking in direction of characters with more than 8 bit as [proposed in my answer](https://stackoverflow.com/a/61448658/4575793). I think that it would be a bold claim to say that every char works in unix everywhere besides the few named ones. – Cadoiz Oct 24 '22 at 07:51
  • @Cadoiz — for traditional Unix, "it works with all permitted 8-bit characters". For non-traditional Unix (such as, perhaps, esoteric — aka modern — Linux file systems, or Apple or Windows), there might be systems that support only strict UTF-8 (which rules out 13 bytes — 0xC0, 0xC1, 0xF5..0xFF). There might be some file systems which support UTF-16 names (which would then include null bytes for characters with a match in ASCII, etc). However, the standard C interfaces do not cope well with UTF-16, so it's unlikely that there are many such file systems on systems where C is the main language. – Jonathan Leffler Oct 24 '22 at 15:02
  • 1
    FWIW: on a MacBook Pro running macOS Big Sur 11.7, I can create a file with the name `:`, but I cannot create a file with the complete name being any of the single bytes 0xC0, 0xC1, 0xF5..0xFF, nor with any of the UTF-8 continuation bytes 0x80..0xBF. The error number is 92 "Illegal byte sequence". From that, I infer that macOS insists that file names be valid UTF-8 strings not containing `/` or the null byte. I haven't checked non-characters like U+FFFF, the surrogate ranges, the PUA (private use area) ranges, and characters in the unassigned code blocks such as U+80000..U+8FFFF. – Jonathan Leffler Oct 24 '22 at 15:42
46

Instead of creating a blacklist of characters, you could use a whitelist. All things considered, the range of characters that make sense in a file or directory name context is quite short, and unless you have some very specific naming requirements your users will not hold it against your application if they cannot use the whole ASCII table.

It does not solve the problem of reserved names in the target file system, but with a whitelist it is easier to mitigate the risks at the source.

In that spirit, this is a range of characters that can be considered safe:

  • Letters (a-z A-Z) - Unicode characters as well, if needed
  • Digits (0-9)
  • Underscore (_)
  • Hyphen (-)
  • Space
  • Dot (.)

And any additional safe characters you wish to allow. Beyond this, you just have to enforce some additional rules regarding spaces and dots. This is usually sufficient:

  • Name must contain at least one letter or number (to avoid only dots/spaces)
  • Name must start with a letter or number (to avoid leading dots/spaces)
  • Name may not end with a dot or space (simply trim those if present, like Explorer does)

This already allows quite complex and nonsensical names. For example, these names would be possible with these rules, and be valid file names in Windows/Linux:

  • A...........ext
  • B -.- .ext

In essence, even with so few whitelisted characters you should still decide what actually makes sense, and validate/adjust the name accordingly. In one of my applications, I used the same rules as above but stripped any duplicate dots and spaces.

AeonOfTime
  • 956
  • 7
  • 10
  • 41
    And what about my non-english-speaking users, who would all be screwed by this? – pkh May 13 '16 at 22:06
  • 3
    @pkh: As I mentioned in my post, you would include any needed unicode characters in your whitelist. Ranges of characters can usually be specified quite easily, especially if you use regular expressions for example. – AeonOfTime May 18 '16 at 17:04
  • 5
    We use a whitelist approach, but don't forget on Windows you have to manage reserved, case-independent strings, like device names (prn, lpt1, con) and . and .. – tahoar Oct 12 '16 at 18:46
  • in DOS, - (hyphen) is not allowed. command.com I think converts it to _ or ignores it depending on kind of DOS. – Jim Michaels Jan 18 '17 at 03:29
  • 3
    You've missed the Windows restricition: must not end in dot or space. – Martin Bonner supports Monica Jan 29 '19 at 10:25
  • Thanks @MartinBonner, I added that info. I tried it in Windows Explorer and the command line, it simply trims the trailing spaces or dot - still, there's no guarantee the programming language one uses will always safely do that for you - not to mention creating files that suddenly do not match the name you used in your application. – AeonOfTime Jan 29 '19 at 15:31
  • @mikerodent `\p{L}` is a good start and is available in some regexp engines. But it wouldn't allow `à` if it occurs in decomposition form: the accent isn't a letter. See https://www.regular-expressions.info/unicode.html – LarsH Jun 04 '19 at 13:25
  • 3
    "you would include any needed unicode characters in your whitelist. Ranges of characters can usually be specified quite easily" - To do this for arbitrary (not known ahead of time) languages would be non-trivial. In some regexp engines you can use categories, like `\p{L}\p{M}*` (https://www.regular-expressions.info/unicode.html) to whitelist any letters together with their diacritics. But it wouldn't include the equivalent of digits, period, hyphen, underscore, etc. in non-Roman scripts. – LarsH Jun 04 '19 at 13:34
  • 6
    "All things considered, the range of characters that make sense in a file or directory name context is quite short." Maybe for some use cases. I'm working on a project now involving media files in 20 languages, and the filenames need to reflect the title of the media item because end users will be finding the content that way. Many of the names use punctuation. Any restriction on filename characters carries a price, so in this case we have to minimize restrictions. In this use case, the range of characters that *don't* make sense in a filename is far shorter and simpler than those that do. – LarsH Jun 04 '19 at 14:09
  • @LarsH, if you are working with 20 languages, I would not expect you to be able to use one catch-all regex. Personally, I would probaby try to create a base file name generator, with the possibility to extend this with specific rules for those languages that need additional or different rules. This way you have a catch-all, and can handle language specifics as well. – AeonOfTime Jun 06 '19 at 05:56
  • 7
    A reality for many programs these days is that you don't know who the customers will be, or what languages they will use. For example if you're publishing to the general public in an app store or Windows or Apple store. You could make your software English-only (or European-only) by default, which is a common approach ... and a frustrating one for speakers of other languages searching for software for their needs. It can also be an avoidable loss of revenue for the developer. It doesn't take that much more effort to design programs to be largely script-agnostic. – LarsH Jun 11 '19 at 17:10
  • 6
    I'd say that any good code will say what it means. In this case, a whitelist feels a lot like a sort of “cargo cult” solution that will break in the case of millions of “unknown unknowns”. You're not disallowing *impossible* values, you're disallowing values that you're too afraid to test. – atimholt Apr 30 '20 at 22:07
  • 2
    @LarsH You can also try to allow as much as possible using unicode like suggested here: https://stackoverflow.com/a/61448658/4575793 Achtually, almost everything is allowed, so maybe a Whitelist is not the best approach. – Cadoiz Aug 16 '20 at 21:32
  • 1
    _"Name must start with a letter or number"_ So the name cannot start with an underscore? I'm confused. – Jeyekomon Jun 08 '21 at 16:30
  • @Jeyekomon: As I mentioned, it should start with a letter or number "_to avoid leading dots/spaces_". An underscore is acceptable, as is a hyphen or other alphanumerical characters. – AeonOfTime Jun 09 '21 at 21:05
  • 2
    @AeonOfTime Ah, in that case I would recommend rewording the line to simple _"Name must not start with a dot or space"_. – Jeyekomon Jun 11 '21 at 10:00
42

The easy way to get Windows to tell you the answer is to attempt to rename a file via Explorer and type in any illegal character, such as a backslash, \, in the new name. Windows will pop up a message box telling you the list of illegal characters:

A file name can't contain any of the following characters:
\ / : * ? " < > |

Here is a screenshot of that popup from Windows 10 Pro:

enter image description here

See: Microsoft Docs - Naming Files, Paths, and Namespaces - Naming Conventions

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
chrisjej
  • 429
  • 4
  • 3
  • 3
    I remember that it used to be like that. I just tried it in Windows 10 and that message box is not showing up anymore, but a sound is being played instead. – Zsolti Jan 25 '21 at 21:02
  • 1
    [This is what the error message looks like](https://i.stack.imgur.com/k0CJD.png) - the edit adding it got rejected. The [archive link](http://web.archive.org/web/20170911113858/https://support.microsoft.com/en-us/help/177506/error-message-filename-is-invalid-or-cannot-contain-any-of-the-followi) is no longer necessary thanks to [the last editor](https://stackoverflow.com/users/4076315/bsmp). How is it in Win11? Message/Sound/??? – Cadoiz Jul 13 '23 at 13:52
29

Well, if only for research purposes, then your best bet is to look at this Wikipedia entry on Filenames.

If you want to write a portable function to validate user input and create filenames based on that, the short answer is don't. Take a look at a portable module like Perl's File::Spec to have a glimpse to all the hops needed to accomplish such a "simple" task.

Leonardo Herrera
  • 8,388
  • 5
  • 36
  • 66
24

Discussing different possible approaches

Difficulties with defining, what's legal and not were already addressed and whitelists were suggested. But not only Windows, but also many Unixoid OSes support more-than-8-bit characters such as Unicode. You could here also talk about encodings such as UTF-8. You can consider Jonathan Leffler's comment, where he gives info about modern Linux and describes details for MacOS. Wikipedia states, that (for example) the

modifier letter colon [(See 7. below) is] sometimes used in Windows filenames as it is identical to the colon in the Segoe UI font used for filenames. The [inherited ASCII] colon itself is not permitted.

Therefore, I want to present a much more liberal approach using Unicode Homoglyph characters to replace the "illegal" ones. I found the result in my comparable use-case by far more readable and it's only limited by the used font, which is very broad, 3903 characters for Windows default. Plus, you can even restore the original content from the replacements.

Using a whole Unicode block such as "fullwidth" as replacement

To keep things organized, I will always give the character, it's name and the hexadecimal number representation. In the comments, i30817 talked about the idea of a reserved range just for 'idiotic OSes that abuse illegal characters' which is basically what Bill Sellers apparently does: "It is not as pretty but it always works and it is easier to remember." Among the candidate blocks, there are the fullwidth, small form variants, combining/modifier/overlay (see 4. below) or halfwidth characters. Consider this table for an overview:

Character Name Original Code Original Char Fullwidth Code Fullwidth Char Small Form Variants Small Form Variant Code
1. Asterisk U+2A * U+FF0A U+FE61
2. Full Stop U+2E . U+FF0E U+FE52
3. Quotation Mark U+22 " U+FF02 none
4. Reverse Solidus U+5C \ U+FF3C U+FE68
5. Solidus U+2F / U+FF0F none
6.1. Left Square Bracket U+5B [ U+FF3B (only tortoise) U+FE5D
6.2. Right Square Bracket U+5D ] U+FF3D (only tortoise) U+FE5E
7. Colon U+3A : U+FF3A U+FE55
8. Semicolon U+3B ; U+FF1B U+FE54
9. Vertical Line U+7C | U+FF5C none
10. Comma U+2C , U+FF0C U+FE50
11. Question Mark U+3F ? U+FF1F U+FE56
12.1. Greater-than Sign U+3E > U+FF1E U+FE65
12.2. Less-than Sign U+3C < U+FF1C U+FE64

Some of the fullwidth characters (1, 6.1, 6.2 and 11) are also included below at "more possible choices and research notes".

How do you type non-standard characters

Say you want to type ⵏ (Tifinagh Letter Yan). To get all of its information, you can always search for this character () on a suited platform such as this Unicode Lookup or that Unicode Table (that only allows to search for the name, in this case "Tifinagh Letter Yan"). You should obtain its Unicode number U+2D4F and the HTML-code &#11599; (note that 2D4F is hexadecimal for 11599). With this knowledge, you have several options to produce these special characters including the use of

  • code points to unicode converter or again the Unicode Lookup (please add 0x when you search for hex) to reversely convert the numerical representation into the unicode character (remember to set the code point base below to decimal or hexadecimal respectively)
  • a one-liner makro in Autohotkey: :?*:altpipe::{U+2D4F} to type instead of the string altpipe - this is the way I input those special characters, my Autohotkey script can be shared if there is common interest
  • Alt Characters or alt-codes by pressing and holding alt, followed by the decimal number for the desired character (more info for example here, look at a table here or there). For the example, that would be Alt+11599. Be aware, that many programs do not fully support this windows feature for all of unicode (as of time writing). Microsoft Office is an exception where it usually works, some other OSes provide similar functionality. Typing these chars with Alt-combinations into MS Word is also the way Wally Brockway suggests in his already mentioned answer¹³ - if you don't want to transfer all the hexadecimal values to their decimal asc, you can find some of them there¹³.
  • in MS Office, you can also use ALT + X as described in this MS article to produce the chars
  • Most OSes provide a character map accessory, where you can find you special characters, often they also include the option to search by name
  • if you rarely need it, you can of course still just copy-paste the special character of your choice instead of typing it

More possible choices and research notes

So you're not happy with how the wider characters look? There are plenty of alternatives. Note: the hexadecimal number representation is is not case sensitive and leading zeroes can be added or omitted freely, so for example U+002A and u+2a are equivalent. If available, I'll try to point to more info or alternatives - feel free to show me more or better ones.

  1. Instead of * (U+2A * ASTERISK), you can use one of the many listed, for example U+2217 ∗ (ASTERISK OPERATOR) or the Full Width Asterisk U+FF0A *. u+20f0 ⃰ combining asterisk above from combining diacritical marks for symbols might also be a valid choice. You can read 4. for more info about the combining characters.

  2. Instead of . (U+2E . full stop), one of these could be a good option, for example ⋅ U+22C5 dot operator.

  3. Instead of " (U+22 " quotation mark), you can use “ U+201C english leftdoublequotemark, more alternatives see here. I also included some of the good suggestions of Wally Brockway's answer, in this case u+2036 ‶ reversed double prime and u+2033 ″ double prime - I will from now on denote ideas from that source by ¹³.

  4. Instead of / (U+2F / SOLIDUS), you can use ∕ DIVISION SLASH U+2215 (others here) or u+2044 ⁄ fraction slash¹³. You could also try ̸ U+0338 COMBINING LONG SOLIDUS OVERLAY or ̷ COMBINING SHORT SOLIDUS OVERLAY U+0337 but be aware about spacing for some characters, including the combining or overlay ones. They have no width on their own and can produce something like --> ̸th̷is which is ̸_th̷_is (underscores added for clarification to these 6 characters). With added spaces you get --> ̸ th ̷ is, which is ̸ _th ̷ _is (plus two spaces, makes 8 chars). The second one (COMBINING SHORT SOLIDUS OVERLAY) looks bad in the stackoverflow-font.

  5. Instead of \ (U+5C Reverse solidus), you can use ⧵ U+29F5 Reverse solidus operator (more) or u+20E5 ⃥ combining reverse solidus overlay¹³.

  6. To replace [ (U+5B [ Left square bracket) and ] (U+005D ] Right square bracket), you can use for example U+FF3B[ FULLWIDTH LEFT SQUARE BRACKET and U+FF3D ]FULLWIDTH RIGHT SQUARE BRACKET (from here, more possibilities here).

  7. Instead of : (u+3a : colon), you can use U+2236 ∶ RATIO (for mathematical usage) or U+A789 ꞉ MODIFIER LETTER COLON, (see colon (letter), sometimes used in Windows filenames as it is identical to the colon in the Segoe UI font used for filenames. The colon itself is not permitted ... source and more replacements see here). Another alternative is this one: u+1361 ፡ ethiopic wordspace¹³.

  8. Instead of ; (u+3b ; semicolon), you can use U+037E ; GREEK QUESTION MARK (see here).

  9. For | (u+7c | vertical line), there are some good substitutes such as: U+2223 ∣ DIVIDES, U+0964 । DEVANAGARI DANDA, U+01C0 ǀ LATIN LETTER DENTAL CLICK (the last ones from Wikipedia) or U+2D4F ⵏ Tifinagh Letter Yan. Also the box drawing characters contain various other options.

  10. Instead of , (, U+002C COMMA), you can use for example ‚ U+201A SINGLE LOW-9 QUOTATION MARK (see here).

  11. For ? (U+003F ? QUESTION MARK), these are good candidates: U+FF1F ? FULLWIDTH QUESTION MARK or U+FE56 ﹖ SMALL QUESTION MARK (from here and here). There are also two more from the Dingbats Block (search for "question") and the u+203d ‽ interrobang¹³.

  12. While my machine seems to accept it unchanged, I still want to include > (u+3e greater-than sign) and < (u+3c less-than sign) for the sake of completeness. The best replacement here is probably also from the quotation block, such as u+203a › single right-pointing angle quotation mark and u+2039 ‹ single left-pointing angle quotation mark respectively. The tifinagh block only contains ⵦ (u+2D66)¹³ to replace <. The last notion is ⋖ less-than with dot u+22D6 and ⋗ greater-than with dot u+22D7.

For even more ideas, you can also look for example into this block. You still want more ideas? You can try to draw your desired character and look at the suggestions here. Please comment if you find something valuable.

greybeard
  • 2,249
  • 8
  • 30
  • 66
Cadoiz
  • 1,446
  • 21
  • 31
  • 1
    I've made a program to apply these changes at https://github.com/DDR0/fuseblk-filename-fixer. Let me know if there's any characters (or patterns) I've missed! – DDR Sep 19 '20 at 04:54
  • 2
    It would be great if 'someone' at the unicode consortium reserved a range just for 'idiotic OSes that abuse illegal characters' whose font mapping would map to the 'illegal characters glyphs' but be different. Even replacements for the ? have different width and characteristics, leading me to want to replace ! too and be annoyed when even then the height is not consistent with '.' (for instance). – i30817 Jan 19 '22 at 04:18
  • it should probably be noted, that while the filesystem will accept these "alternatives" they are likely to cause issues elsewhere. i added U+FF3B to a file path, windows had no issues. however when i tried to System.IO.File.ReadAllBytes in c#, it crashed. so these should definitely not be used as a workaround to filesystem limitations. – Heriberto Lugo Nov 09 '22 at 20:49
  • 1
    This is what I do but I just use the fullwidth character options for all of them. It is not as pretty but it always works and it is easier to remember. I just search for 'fullwidth' in the Windows Character Map accessory. Halfwidth is also an option but the fullwidth options look a little better to me. But I agree with the suggestion of a 7bit ASCII duplication range in Unicode, or Windows could just use one of the private ranges... – Bill Sellers Jul 21 '23 at 08:28
  • @BillSellers I didn't find much useful in the halfwidth category. And you might be interested in my latest edit. – Cadoiz Aug 07 '23 at 12:35
14

For Windows you can check it using PowerShell

$PathInvalidChars = [System.IO.Path]::GetInvalidPathChars() #36 chars

To display UTF-8 codes you can convert

$enc = [system.Text.Encoding]::UTF8
$PathInvalidChars | foreach { $enc.GetBytes($_) }

$FileNameInvalidChars = [System.IO.Path]::GetInvalidFileNameChars() #41 chars

$FileOnlyInvalidChars = @(':', '*', '?', '\', '/') #5 chars - as a difference
Zoe
  • 27,060
  • 21
  • 118
  • 148
11

For anyone looking for a regex:

const BLACKLIST = /[<>:"\/\\|?*]/g;
Kartik Soneji
  • 1,066
  • 1
  • 13
  • 25
7

In Windows 10 (2019), the following characters are forbidden by an error when you try to type them:

A file name can't contain any of the following characters:

\ / : * ? " < > | enter image description here

Cadoiz
  • 1,446
  • 21
  • 31
Bret Cameron
  • 451
  • 1
  • 4
  • 18
4

Here's a c# implementation for windows based on Christopher Oezbek's answer

It was made more complex by the containsFolder boolean, but hopefully covers everything

/// <summary>
/// This will replace invalid chars with underscores, there are also some reserved words that it adds underscore to
/// </summary>
/// <remarks>
/// https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names
/// </remarks>
/// <param name="containsFolder">Pass in true if filename represents a folder\file (passing true will allow slash)</param>
public static string EscapeFilename_Windows(string filename, bool containsFolder = false)
{
    StringBuilder builder = new StringBuilder(filename.Length + 12);

    int index = 0;

    // Allow colon if it's part of the drive letter
    if (containsFolder)
    {
        Match match = Regex.Match(filename, @"^\s*[A-Z]:\\", RegexOptions.IgnoreCase);
        if (match.Success)
        {
            builder.Append(match.Value);
            index = match.Length;
        }
    }

    // Character substitutions
    for (int cntr = index; cntr < filename.Length; cntr++)
    {
        char c = filename[cntr];

        switch (c)
        {
            case '\u0000':
            case '\u0001':
            case '\u0002':
            case '\u0003':
            case '\u0004':
            case '\u0005':
            case '\u0006':
            case '\u0007':
            case '\u0008':
            case '\u0009':
            case '\u000A':
            case '\u000B':
            case '\u000C':
            case '\u000D':
            case '\u000E':
            case '\u000F':
            case '\u0010':
            case '\u0011':
            case '\u0012':
            case '\u0013':
            case '\u0014':
            case '\u0015':
            case '\u0016':
            case '\u0017':
            case '\u0018':
            case '\u0019':
            case '\u001A':
            case '\u001B':
            case '\u001C':
            case '\u001D':
            case '\u001E':
            case '\u001F':

            case '<':
            case '>':
            case ':':
            case '"':
            case '/':
            case '|':
            case '?':
            case '*':
                builder.Append('_');
                break;

            case '\\':
                builder.Append(containsFolder ? c : '_');
                break;

            default:
                builder.Append(c);
                break;
        }
    }

    string built = builder.ToString();

    if (built == "")
    {
        return "_";
    }

    if (built.EndsWith(" ") || built.EndsWith("."))
    {
        built = built.Substring(0, built.Length - 1) + "_";
    }

    // These are reserved names, in either the folder or file name, but they are fine if following a dot
    // CON, PRN, AUX, NUL, COM0 .. COM9, LPT0 .. LPT9
    builder = new StringBuilder(built.Length + 12);
    index = 0;
    foreach (Match match in Regex.Matches(built, @"(^|\\)\s*(?<bad>CON|PRN|AUX|NUL|COM\d|LPT\d)\s*(\.|\\|$)", RegexOptions.IgnoreCase))
    {
        Group group = match.Groups["bad"];
        if (group.Index > index)
        {
            builder.Append(built.Substring(index, match.Index - index + 1));
        }

        builder.Append(group.Value);
        builder.Append("_");        // putting an underscore after this keyword is enough to make it acceptable

        index = group.Index + group.Length;
    }

    if (index == 0)
    {
        return built;
    }

    if (index < built.Length - 1)
    {
        builder.Append(built.Substring(index));
    }

    return builder.ToString();
}
Cadoiz
  • 1,446
  • 21
  • 31
  • I have three questions: 1. Why did you initialise ``StringBuilder`` with initial capacity value? 2. Why did you add 12 to the length of the ``filename``? 3. Was 12 chosen arbitrarily or was there some thought behind this number? – iiminov May 20 '20 at 14:46
  • 1
    Sorry for the delay, I just noticed this question 1) Initializing stringbuilder with a length is a bit of a micro optimization. I don't remember exactly, but it starts with a small buffer and doubles each time the buffer size is exceeded. 2) Adding a bit extra guarantees that the length isn't off by one. 3) The world would be better off if we use dozenal instead of decimal. 12 is the dozenal equivalent of adding 10 (I just needed to pad the length by a small arbitrary amount). – Charlie Rix Apr 01 '21 at 17:58
4

The .NET Framework System.IO provides the following functions for invalid file system characters:

Those functions should return appropriate results depending on the platform the .NET runtime is running in. That said, the Remarks in the documentation pages for those functions say:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system.

gridtrak
  • 731
  • 7
  • 20
3

Though the only illegal Unix chars might be / and NULL, although some consideration for command line interpretation should be included.

For example, while it might be legal to name a file 1>&2 or 2>&1 in Unix, file names such as this might be misinterpreted when used on a command line.

Similarly it might be possible to name a file $PATH, but when trying to access it from the command line, the shell will translate $PATH to its variable value.

CodeMouse92
  • 6,840
  • 14
  • 73
  • 130
  • 2
    for literals in BASH, the best way I've found to declare literals without interpolation is `$'myvalueis'`, ex: `$ echo 'hi' > $'2>&1'`, `cat 2\>\&1` "hi" – ThorSummoner Jul 07 '17 at 19:42
3

I always assumed that banned characters in Windows filenames meant that all exotic characters would also be outlawed. The inability to use ?, / and : in particular irked me. One day I discovered that it was virtually only those chars which were banned. Other Unicode characters may be used. So the nearest Unicode characters to the banned ones I could find were identified and MS Word macros were made for them as Alt+?, Alt+: etc. Now I form the filename in Word, using the substitute chars, and copy it to the Windows filename. So far I have had no problems.

Here are the substitute chars (Alt + the decimal Unicode) :

  • ⃰ ⇔ Alt8432
  • ⁄ ⇔ Alt8260
  • ⃥ ⇔ Alt8421
  • ∣ ⇔ Alt8739
  • ⵦ ⇔ Alt11622
  • ⮚ ⇔ Alt11162
  • ‽ ⇔ Alt8253
  • ፡ ⇔ Alt4961
  • ‶ ⇔ Alt8246
  • ″ ⇔ Alt8243

As a test I formed a filename using all of those chars and Windows accepted it.

Cadoiz
  • 1,446
  • 21
  • 31
  • I took the freedom to improve your formatting for better readability. I also explained the same base idea above and now incorporated some of your suggestions, if that's okay. Thank you! https://stackoverflow.com/a/61448658/4575793 – Cadoiz Apr 21 '21 at 20:39
1

This is good enough for me in Python:

def fix_filename(name, max_length=255):
    """
    Replace invalid characters on Linux/Windows/MacOS with underscores.
    List from https://stackoverflow.com/a/31976060/819417
    Trailing spaces & periods are ignored on Windows.
    >>> fix_filename("  COM1  ")
    '_ COM1 _'
    >>> fix_filename("COM10")
    'COM10'
    >>> fix_filename("COM1,")
    'COM1,'
    >>> fix_filename("COM1.txt")
    '_.txt'
    >>> all('_' == fix_filename(chr(i)) for i in list(range(32)))
    True
    """
    return re.sub(r'[/\\:|<>"?*\0-\x1f]|^(AUX|COM[1-9]|CON|LPT[1-9]|NUL|PRN)(?![^.])|^\s|[\s.]$', "_", name[:max_length], flags=re.IGNORECASE)

See also this outdated list for additional legacy stuff like = in FAT32.

Cees Timmerman
  • 17,623
  • 11
  • 91
  • 124
1

The OP's question has already been fully answered here and here, for instance. Here I am just extending those answers by showing how to fix it on Linux:

In Linux, find all file and folder names with characters which are forbidden in Windows

If you're on Linux, and you just want to find all file and folder names with characters which are forbidden in Windows, you can run the following command:

# Find all files and folders with any of these Windows-illegal characters in
# their name:  \ : * ? " < > |
find . -name '*[\\:\*?\"<\>|]*'

This is really useful, for instance, so you can manually clean or "fix" a git code repository written on Linux which you now need to clone and use on Windows. If you don't find and clean out and fix all of the Windows-incompatible characters in file and folder names first, the repository will fail to clone on Windows, and you'll see errors like this, for instance:

$ git clone https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world.git
Cloning into 'eRCaGuy_hello_world'...
remote: Enumerating objects: 4342, done.
remote: Counting objects: 100% (1184/1184), done.
remote: Compressing objects: 100% (366/366), done.
remote: Total 4342 (delta 819), reused 1149 (delta 799), pack-reused 3158Receiving objects: 100% (4342/4342), 6.50 Mi
Receiving objects: 100% (4342/4342), 7.02 MiB | 6.50 MiB/s, done.

Resolving deltas: 100% (2725/2725), done.
error: invalid path 'cpp/class_copy_constructor_and_assignment_operator/Link to Copy constructor vs assignment operat
or in C++ - GeeksforGeeks%%%%% [see `t2 = t1;  -- calls assignment operator, same as "t2.operator=(t1);" `].desktop'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Above, you can see the error: invalid path which made the git clone fail because my filename at path cpp/class_copy_constructor_and_assignment_operator/Link to Copy constructor vs assignment operat or in C++ - GeeksforGeeks%%%%% [see `t2 = t1; -- calls assignment operator, same as "t2.operator=(t1);" `].desktop is invalid in Windows, causing the repository to fail to clone on Windows, because it has the double quote (") character in it. So, I'm going to manually rename that file on Linux, removing the " chars, and push the changes to my git repository so that I can then clone it on Windows.

Keep Windows file paths <= 259 chars, and folder paths <= 248 chars (git clone error: Filename too long)

Even if you remove the forbidden chars from your folder and filenames by finding them with the find . -name '*[\\:\*?\"<\>|]*' command above, keep in mind that the Windows MAX_PATH limitation is still in place, limiting your total path length to <= 259 chars for a file, or <= 248 chars for a folder. See here: Maximum filename length in NTFS (Windows XP and Windows Vista)?

If you violate this path limit and then try to git clone a repo on Windows, you'll get this Filename too long error:

$ git clone https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world.git
Cloning into 'eRCaGuy_hello_world'...
remote: Enumerating objects: 4347, done.
remote: Counting objects: 100% (1189/1189), done.
remote: Compressing objects: 100% (370/370), done.
remote: Total 4347 (delta 823), reused 1152 (delta 800), pack-reused 3158
Receiving objects: 100% (4347/4347), 7.03 MiB | 5.82 MiB/s, done.
Resolving deltas: 100% (2729/2729), done.
error: unable to create file cpp/class_copy_constructor_and_assignment_operator/Link to Copy constructor vs assignmen
t operator in C++ - GeeksforGeeks%%%%% [see `t2 = t1;  -- calls assignment operator, same as ''t2.operator=(t1);'' `]
.desktop: Filename too long
Updating files: 100% (596/596), done.
Filtering content: 100% (8/8), 2.30 MiB | 2.21 MiB/s, done.
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Notice this part, because of my ridiculously-long filename:

error: unable to create file cpp/class_copy_constructor_and_assignment_operator/Link to Copy constructor vs assignment operator in C++ - GeeksforGeeks%%%%% [see `t2 = t1; -- calls assignment operator, same as ''t2.operator=(t1);'' `].desktop: Filename too long

Shorten your long filename to reduce the path length, commit and push the change, and try to clone again.

References:

  1. While on Windows 10 Pro, I tried to type a " into a folder name, and I got this popup window error:

    enter image description here

  2. I used https://regex101.com/ (see: https://regex101.com/r/lI5Lg9/1), to build and test the [\\:\*?\"<\>|] regular expression to know which characters to escape, by looking in the "Explanation" section on the right-hand side:

    enter image description here

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
0

As of 18/04/2017, no simple black or white list of characters and filenames is evident among the answers to this topic - and there are many replies.

The best suggestion I could come up with was to let the user name the file however he likes. Using an error handler when the application tries to save the file, catch any exceptions, assume the filename is to blame (obviously after making sure the save path was ok as well), and prompt the user for a new file name. For best results, place this checking procedure within a loop that continues until either the user gets it right or gives up. Worked best for me (at least in VBA).

David Spector
  • 1,520
  • 15
  • 21
FCastro
  • 581
  • 6
  • 7
  • 3
    Your answer @FCastro is correct from the technical point of view. However from the UX perspective it's a nightmare - the user is forced to play the "type something and I'll tell you if you succeed" game again and again. I'd rather see a message (warning style) telling the user that they have entered an illegal character which will later be converted. – Mike Sep 22 '17 at 12:44
  • 2
    Christopher Oezbek provided such a black list in 2015. – Jim Balter Oct 08 '18 at 19:01
-2

In Unix shells, you can quote almost every character in single quotes '. Except the single quote itself, and you can't express control characters, because \ is not expanded. Accessing the single quote itself from within a quoted string is possible, because you can concatenate strings with single and double quotes, like 'I'"'"'m' which can be used to access a file called "I'm" (double quote also possible here).

So you should avoid all control characters, because they are too difficult to enter in the shell. The rest still is funny, especially files starting with a dash, because most commands read those as options unless you have two dashes -- before, or you specify them with ./, which also hides the starting -.

If you want to be nice, don't use any of the characters the shell and typical commands use as syntactical elements, sometimes position dependent, so e.g. you can still use -, but not as first character; same with ., you can use it as first character only when you mean it ("hidden file"). When you are mean, your file names are VT100 escape sequences ;-), so that an ls garbles the output.

forthy42
  • 102
  • 5
-2

When creating internet shortcuts in Windows, to create the file name, it skips illegal characters, except for forward slash, which is converted to minus.

Matthias Ronge
  • 9,403
  • 7
  • 47
  • 63
-6

I had the same need and was looking for recommendation or standard references and came across this thread. My current blacklist of characters that should be avoided in file and directory names are:

$CharactersInvalidForFileName = {
    "pound" -> "#",
    "left angle bracket" -> "<",
    "dollar sign" -> "$",
    "plus sign" -> "+",
    "percent" -> "%",
    "right angle bracket" -> ">",
    "exclamation point" -> "!",
    "backtick" -> "`",
    "ampersand" -> "&",
    "asterisk" -> "*",
    "single quotes" -> "“",
    "pipe" -> "|",
    "left bracket" -> "{",
    "question mark" -> "?",
    "double quotes" -> "”",
    "equal sign" -> "=",
    "right bracket" -> "}",
    "forward slash" -> "/",
    "colon" -> ":",
    "back slash" -> "\\",
    "lank spaces" -> "b",
    "at sign" -> "@"
};
Meng Lu
  • 13,726
  • 12
  • 39
  • 47
  • 5
    would you mind commenting on having `@` in the list ? – PypeBros Oct 25 '16 at 08:28
  • 12
    The question was which characters are illegal. Most of the characters in your list are legal. – Nigel Alderton Jan 11 '17 at 12:18
  • 8
    the letter `b`? lol, I assume that's the b from `lank spaces`... well that still leaves a few... I renamed a picture `(),-.;[]^_~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ.jpg` but had to change it back because it looked *angry*... – ashleedawg Mar 03 '18 at 07:16