80

Consider a Save As dialog with a free text entry where the user enters a file name as free text, then clicks a Save button. The software then validates the file name, and saves the file if the name is valid.

On a Unix file system, what rules should be applied in the validation such that:

  • The name will not be difficult to manipulate later in terms of escaping special characters, etc.
  • The rules are not so restrictive that saving a file becomes non-user-friendly.

So basically, what is the minimum set of characters that should be restricted from a Unix file name?

barrymc
  • 1,559
  • 2
  • 12
  • 13

7 Answers7

66

The minimum are slash ('/') and NULL ('\0')

mouviciel
  • 66,855
  • 13
  • 106
  • 140
  • 1
    The minimum is /, ; and | to avoid the user running arbitrary commands (assuming it's not escaped :)) – workmad3 Jan 19 '09 at 15:43
  • 4
    This. No characters besides '/' should be disallowed. – nobody Jan 19 '09 at 15:47
  • 3
    And ASCII NUL '\0' since that marks the end of the file name :D – Jonathan Leffler Jan 19 '09 at 15:47
  • 5
    This is the rigourous answer. The application should be coded to assume that the user was this unconstrained (so when opening files, it should accept any name). It isn't such a good answer for saving (new) files; it is reasonable to put some limits on the file names. – Jonathan Leffler Jan 19 '09 at 15:56
  • @mouviciel : given some filesystems like ꜰᴀᴛ support the ɴᴜʟʟ character. What would happen if the ɴᴜʟʟ character is present in the middle of a filename. – user2284570 Oct 02 '15 at 15:45
  • @user2284570: I don't know. My bet is that it is not possible in the context of a "Save as..." dialog box. – mouviciel Oct 02 '15 at 16:12
  • @mouviciel : in the case the filename was recorded from an another ᴏꜱ of course. – user2284570 Oct 02 '15 at 16:43
  • I'd add ' and "" since they can at least lead to troublesome unmatched quote situations at times. () too, shells don't like those. : is a separator in paths. – Alan Corey Mar 27 '21 at 22:52
44

Firstly, what you're describing is black listing. Your better option is to white list your characters, as it is easier (from a user perspective) to have characters inserted rather than taken away.

In terms of what would be good in a unix environment:

  • a-z
  • A-Z
  • 0-9
  • underscore (_)
  • dash (-)
  • period (.)

Should cover your basics. Spaces can be okay, but make things difficult. Windows users love them, unix/linux don't. So depending on your target audience choose accordingly.

chrki
  • 6,143
  • 6
  • 35
  • 55
Gavin Miller
  • 43,168
  • 21
  • 122
  • 188
  • 2
    Newlines are a nuisance. Commas are pretty harmless. Colon would do no damage in Unix, but are problematic if the name is copied to Windows - or if the 'file' is a directory that might need to be added to PATH. – Jonathan Leffler Jan 19 '09 at 15:53
  • 3
    There is some room to argue that any characters classified as 'isalpha()' in the current locale are OK - that allows people to use accented characters in the names. It complicates the story, though. – Jonathan Leffler Jan 19 '09 at 15:57
  • 30
    i for one will regard anything that probits accented characters as user-unfriendly –  Jan 19 '09 at 16:45
  • 5
    What happens with file names in different languages? – Dr. Koutheir Attouchi Jun 06 '17 at 07:57
  • colons are problematic in Linux if it's a directory name that is in your $PATH variable - colons are used as separators in the $PATH – Greg Smith Sep 07 '21 at 17:51
28

Although the accepted answer might have truth I think there's a benefit to having some restrictions that could be potentially annoying for scripting or other stuff:

  • forward slash (/)
  • backslash (\)
  • NULL (\0)
  • tick (`)
  • starts with a dash (-)
  • star (*)
  • pipes (|)
  • semicolon (;)
  • quotations (" or ')
  • colon (:)

( - maybe space though I'm reluctant to add that.)

As you can see you might just be better off whitelisting as @Gavin suggests...

ThinkBonobo
  • 15,487
  • 9
  • 65
  • 80
  • This is a pretty good list. I would also suggest excluding "!" though, which might be used for history expansion when typed interactively. Oh, and leading periods (hidden) and "<" or ">" (redirection). – Steve Jorgensen Mar 29 '19 at 05:36
  • And keep in mind that you still might run across spaces, tabs, and newlines in filenames in Unix. Your code shouldn't blow up just for seeing that. – Randal Schwartz Jan 03 '21 at 19:09
  • @ThinkBonobo would the list also apply to latest macOS. i ask because it is based on UNIX, i think? – Jules Manson Jul 03 '22 at 20:39
  • @JulesManson the list I gave was to best support cross OS support so it should also apply to macOS. Note Macs will probably support some of the items in the list but I just avoid them to avoid compatibility issues elsewhere. – ThinkBonobo Jul 14 '22 at 15:02
23

Often forgotten: the colon (:) is not a good idea, since it's commonly used in stuff like $PATH, i.e. the list of directories where executables are found "automatically". This can cause confusion with DOS/Windows directory names, where of course the colon is used in drive names.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • also ldd on linux can get confused looking for rpaths if there are colons present – Jon Jan 28 '16 at 04:36
  • If you have a colon in the file name, and you use that partition on Windows and delete the file, it will result in file system corruption. It can be resolved with Windows' "repair disk" tool though. – Kenji Nov 30 '16 at 13:11
11

Do not forget that you can add a dot (.) at the beginning to hide files and folders... Otherwise, I'd follow a *NIX name convention (from Wikipedia):

Most UNIX file systems

  • Case handling: case-sensitive case-preservation
  • Allowed character set: any.
  • Reserved characters: /, null.
  • Max length: 255.
  • Notes: A leading . indicates that ls and file managers will not by default show the file

Link to wikipedia article about file names

Nolan Akash
  • 989
  • 8
  • 21
Tobias Wärre
  • 793
  • 5
  • 11
10

Encode FTW

As Bombe points out in their answer, restricting user input is at least frustrating if not downright annoying. Though, as developers we should assume that every interaction with our code is malicious and treat them as such.

To solve both problems in a practical application, rather than white-or-black-listing certain characters, we should simply not use the user input as the file name.

Instead, use a safe name (hex chars [a-f0-9] only for ultimate safety) of our own devising, either encoded from the user input (e.g. PHP's bin2hex), or a randomly generated ID (e.g. PHP's uniqid) which is then mapped by some method (take your pick) to the user input.

Encoding/decoding can be done on the fly with no reliance on mapping, so is practically ideal. The user never needs to know what the file is really called; as long as they can get/set the file, and it appears to be called what they wanted, everyone's a winner.

By this methodology, the user can call their file whatever they like, hackers will be the only people frustrated, and your file system will love you :-)

Community
  • 1
  • 1
Fred Gandt
  • 4,217
  • 2
  • 33
  • 41
  • 1
    Excellent advice! It's the same principle as storing names as `name` rather than trying to enforce `first` and `last` separately (which makes me *so mad*). Or when I run into _any_ restrictions on passwords other than _minimum_ length. ("No spaces allowed?!? For what earthly reason!?") Obviously this is more appropriate in some situations than others. Sometimes you _have_ to let the user specify the actual file name for perfectly valid reasons. – DaveGauer May 24 '18 at 17:31
-3

Let the user enter whatever name he wants. Artificially restricting the range of characters will only annoy the users and serve no real purpose.

Bombe
  • 81,643
  • 20
  • 123
  • 127
  • 9
    Or, better: '$(rm -fr $HOME)' (minus the single quotes) as the file name? That will wreak havoc sooner rather than later. Backticks and $(...) are particularly pernicious as they 'work' when the file name is quoted, unlike most of the other special characters. Embedded quotes are tricky, too. – Jonathan Leffler Jan 19 '09 at 15:51
  • 8
    Those are all non-issues when saving the filename. fopen() doesn’t care about your filenames. When using a graphical shell (e.g. konqueror) it doesn’t care about your filenames. When you use auto-completion in the shell it doesn’t care about your filenames. So what are your points? :) – Bombe Jan 19 '09 at 15:53
  • 3
    @Bombe, what one user might want in many cases will alienate other users, regardless of the havoc it plays with your UI development process. Bad idea. – dkretz Jan 19 '09 at 16:52
  • 9
    That’s my point: choosing strange names will not wreak havoc with anything—unless your “anything” is badly written. None of the standard tools of UNIX is badly written. Again: what’s your point? – Bombe Jan 19 '09 at 17:13
  • 3
    What a short sighted answer from someone that really should know better. Your answer didn't even properly answer the original question. They say `The name will not be difficult to manipulate later in terms of escaping special characters, etc.`. People have noted here that there are quite a few characters that *can* be in valid file names, but realistically cause a bunch of problems. – JamEngulfer Dec 14 '15 at 12:31