224

Where can I find a list of allowed characters in filenames, depending on the operating system? (e.g., on Linux, the character : is allowed in filenames, but not on Windows)

python dude
  • 7,980
  • 11
  • 40
  • 53
  • .NET provides that info for Windows. – leppie Jan 27 '11 at 08:20
  • http://stackoverflow.com/questions/2679699/what-characters-allowed-in-file-names-on-android/13502029 – k4dima Nov 21 '12 at 21:07
  • 10
    @kreker note that your question is about Android – congusbongus Aug 10 '13 at 14:02
  • @congusbongus http://en.wikipedia.org/wiki/Comparison_of_file_systems – k4dima Aug 10 '13 at 15:27
  • 2
    Possible duplicate of [What characters are forbidden in Windows and Linux directory names?](https://stackoverflow.com/q/1976007/608639) – jww Jul 04 '19 at 17:18
  • 2
    Not sure how this could be considered a "recommendation for books, tools, software libraries, and more". It's clearly asking what the allowed characters are for a variety of filesystems, something that's quite handy if you're looking to use a common base. I see this as no different than asking what any specific limitation is. I suspect the recommendation reason for closure is more suited for *actual* requests for recommendations, such as "What's a good book for learning Python programming?". – paxdiablo Nov 21 '20 at 06:26
  • 5
    @paxdiablo Just voted to reopen. – Piotr Dobrogost Apr 05 '21 at 13:34
  • 1
    I have also voted to re-open, this is a valid question. It does not ask for recommendation, it is asking for the source of information. – AaA Sep 09 '21 at 07:54

8 Answers8

151

You should start with the Wikipedia Filename page. It has a decent-sized table (Comparison of filename limitations), listing the reserved characters for quite a lot of file systems.

It also has a plethora of other information about each file system, including reserved file names such as CON under MS-DOS. I mention that only because I was bitten by that once when I shortened an include file from const.h to con.h and spent half an hour figuring out why the compiler hung.

Turns out DOS ignored extensions for devices so that con.h was exactly the same as con, the input console (meaning, of course, the compiler was waiting for me to type in the header file before it would continue).

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 7
    I find the Wikipedia page somewhat vague and confusing, e.g. "Some operating systems prohibit some particular characters...". I'm actually looking for a complete table that lists all allowed and disallowed characters. – python dude Jan 27 '11 at 08:30
  • 7
    @python, don't look at that table, look at the big honkin' one underneath it (entitled "Comparison of file name limitations"). That's _not_ so vague in its content. – paxdiablo Jan 27 '11 at 08:34
  • 74
    Probably all you need is to look at the `POSIX "Fully portable filenames"` entry, which lists these: `A–Z a–z 0–9 . _ -` – Val Kornea Jul 02 '14 at 22:31
  • 1
    @VladimirKornea thanks! Links: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_07 || http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_277 – Ciro Santilli OurBigBook.com Aug 06 '15 at 20:59
  • I'd like to see the reasoning behind the POSIX "Fully portable filenames" thing. I've noticed that # works on OSX + Ubuntu and its not in the list of "reserved characters" for windows. Brackets also seem to work so what gives? – CpILL Dec 29 '15 at 09:28
  • 1
    @CpILL There are more OSs than just Windows, OSX and Linux... some have very simple file systems. – elegant dice Feb 08 '16 at 09:25
  • @elegant dice yeah, but on the web thats all we care about and most app developers I imagine. The assumptions for the different OSs would be handy so one can build the list for just the OS you care about. – CpILL Feb 10 '16 at 03:53
  • 1
    @CpILL you ask for the reasoning behind POSIX "Fully portable filenames", and that is the reason. The world is bigger than just the web... Also note that the web is also made up of IoT devices which can have some very limited and specific OSs. Someone has to write embedded and mainframe software. I agree that a list for the diff OSs would be useful tho. – elegant dice Feb 10 '16 at 23:54
  • 2
    @CpILL Notice that # must be urlencoded in a web context or it will break URLs, since # in the URL indicates the start of the hash fragment. None of the POSIX "Fully portable filenames" need to be urlencoded. Even if all the OSs you care about allow # characters in the filename, you might have been better off allowing only "portable" characters, for some such other reason that you haven't considered yet. – Val Kornea May 13 '16 at 11:30
  • @elegant-dice i wanted to know the _details_ of the reasoning so i can ignore the OS's that are not in the big 3 or are defunct etc. – CpILL May 18 '16 at 10:05
  • Your are right @Vladimir-Kornea i forgot about the special meaning of # in URLs – CpILL May 18 '16 at 10:05
  • 2
    @VladimirKornea the question states "depending on the operating system" and not URLs. You should always pass your filenames thought a url encoder/decoder in any case. – CpILL Nov 30 '16 at 13:18
  • Even though table from Wiki is corect for file system, it is missing reserved names for OS which is only listed under notes (for windows, os/2 in notes `i` for example) for example you cannot name a file COM1 or you can crash windows by naming a file with reserved GUID in specific places. – AaA Sep 09 '21 at 08:02
  • @AaA: hence my comment about the plethora of other information on that page, specifically calling out things such as reserved names. – paxdiablo Sep 09 '21 at 09:27
  • @paxdiablo, No argument about your answer which is correct, just adding extra information, which bite me just few days ago and figured it out an hour ago. something like {ED7BA470-8E54-465E-825C-99712043E01C} as file extension in windows – AaA Sep 09 '21 at 09:36
96

OK, so looking at Comparison of file systems if you only care about the main players file systems:

so any byte except NUL, \, /, :, *, ?, ", <, >, | and you can't have files/folders call . or .. and no control characters (of course).

shawn
  • 4,305
  • 1
  • 17
  • 25
CpILL
  • 6,169
  • 5
  • 38
  • 37
  • 10
    This is not correct. Linux doesn't allow `/`. Windows doesn't allow backslash and some strings (e.g. `CON`). – kgadek Mar 23 '17 at 17:42
  • 29
    yeah, hence i said *except*. – CpILL May 15 '17 at 17:33
  • 3
    On Mac (running HFS+), I am able to create files with `:`s in their names. – erwaman Oct 25 '17 at 20:38
  • This is not correct. See [this answer](https://stackoverflow.com/a/39273548/2415524) for more characters that Windows does not allow. – mbomb007 Nov 01 '17 at 20:42
  • Windows does not allow any controls chars, either (but the Mac does, other than NUL) – Thomas Tempelmann Nov 29 '17 at 17:02
  • The Mac does allow "/" in a file name when using the classic (Carbon) APIs, and ":" when using the POSIX APIs (and it swaps them, so if you enter a name with "/" in the Finder, which is legal, it'll show up as a ":" when checking the name in Terminal, for instance) – Thomas Tempelmann Nov 29 '17 at 17:04
  • This is wrong for ext[2-4]. Per link you provided, it says "Any byte except `NUL`, `/`" – KFL Mar 26 '18 at 02:53
  • 4
    using %$# in paths will cause issues in bash scripts (cd $mydir) using % in paths will cause issues in windows scripts (cd %1) – Systemsplanet Jun 26 '18 at 02:38
  • @Systemsplanet's comment can be interpreted in two ways, so to clarify: if you remember to quote/escape those characters, they will not cause issues for your scripts. – mtraceur Aug 13 '23 at 05:03
29

On Windows OS create a file and give it a invalid character like \ in the filename. As a result you will get a popup with all the invalid characters in a filename.

enter image description here

Devid
  • 1,823
  • 4
  • 29
  • 48
7

To be more precise about Mac OS X (now called MacOS) / in the Finder is interpreted to : in the Unix file system.

This was done for backward compatibility when Apple moved from Classic Mac OS.

It is legitimate to use a / in a file name in the Finder, looking at the same file in the terminal it will show up with a :.

And it works the other way around too: you can't use a / in a file name with the terminal, but a : is OK and will show up as a / in the Finder.

Some applications may be more restrictive and prohibit both characters to avoid confusion or because they kept logic from previous Classic Mac OS or for name compatibility between platforms.

WebDevBooster
  • 14,674
  • 9
  • 66
  • 70
1

Rather than trying to identify all the characters that are unwanted, you could just look for anything except the acceptable characters. Here's a regex for anything except posix characters:

cleaned_name = re.sub(r'[^[:alnum:]._-]', '', name)

Dog Pilot
  • 21
  • 1
0

For "English locale" file names, this works nicely. I'm using this for sanitizing uploaded file names. The file name is not meant to be linked to anything on disk, it's for when the file is being downloaded hence there are no path checks.

$file_name = preg_replace('/([^\x20-~]+)|([\\/:?"<>|]+)/g', '_', $client_specified_file_name);

Basically it strips all non-printable and reserved characters for Windows and other OSs. You can easily extend the pattern to support other locales and functionalities.

TheRealChx101
  • 1,468
  • 21
  • 38
0

I took a different approach. Instead of looking if the string contains only valid characters, I look for invalid/illegal characters instead.

NOTE: I needed to validate a path string, not a filename. But if you need to check a filename, simply add / to the set.

def check_path_validity(path: str) -> bool:
    # Check for invalid characters
    for char in set('\?%*:|"<>'):
        if char in path:
            print(f"Illegal character {char} found in path")
            return False
    return True
Rob
  • 1
  • 1
-1

Here is the code to clean file name in python.

import unicodedata

def clean_name(name, replace_space_with=None):
    """
    Remove invalid file name chars from the specified name

    :param name: the file name
    :param replace_space_with: if not none replace space with this string
    :return: a valid name for Win/Mac/Linux
    """

    # ref: https://en.wikipedia.org/wiki/Filename
    # ref: https://stackoverflow.com/questions/4814040/allowed-characters-in-filename
    # No control chars, no: /, \, ?, %, *, :, |, ", <, >

    # remove control chars
    name = ''.join(ch for ch in name if unicodedata.category(ch)[0] != 'C')

    cleaned_name = re.sub(r'[/\\?%*:|"<>]', '', name)
    if replace_space_with is not None:
        return cleaned_name.replace(' ', replace_space_with)
    return cleaned_name
Du D.
  • 5,062
  • 2
  • 29
  • 34
  • 3
    The code does not check for invalid (reserved) names, and does not check for an invalid character in replace_space_with, too. Length of file name is beyond of scope. So, `:return: a valid name for Win/Mac/Linux` is not true in all circumstances. – AcK Sep 25 '18 at 08:09