1

When user passes a directory name to my program, i check it against

private static final Pattern    DIRECTORY_PATTERN   
            = Pattern.compile("/*?([a-zA-Z_0-9]+)/*?",
                    Pattern.CASE_INSENSITIVE);

For what we've seen so far, this works, but i suspect this regex is incomplete.

Do you know of, or can you suggest a more complete regex, which would validate directory name?

James Raitsev
  • 92,517
  • 154
  • 335
  • 470
  • What is an example of a string you want it to not accept? As far as I know, any string can be a valid directory name. – Random832 Jul 10 '12 at 16:17
  • This is OS and File System dependent. http://stackoverflow.com/questions/537772/what-is-the-most-correct-regular-expression-for-a-unix-file-path provides a nice explanation. – Chris Dargis Jul 10 '12 at 16:19

2 Answers2

3

Actually, there are a great many more characters you can use in a file name, even heinous things like backspaces and newline characters. In fact, you may find it depends on the underlying file system. I vaguely remember a rule somewhere that allowed everything except the actual path separator.

One thing I always consider when deciding if something is valid is to use it. For example, you can validate the format of an email address with a (complex) regex but the only way to be certain it's fully valid is to send a hyperlink mail to it to verify it's received.

In your particular case, if you want to create a file with that name, you can try to create a temporary file, in a directory you're actually allowed to create files in. If the file is created successfully, you can be pretty sure it's a valid name :-) Of course, if you're creating a file, you may just want to create the real file. If you're opening an existing file, forget the regex, just try to open the file - no amount of complication in your regex will tell you if the file exists or is readable by you.

To be frank, though, I'd consider placing your own limitations on the allowed characters - I have, in the past, cursed people who were silly enough to create file names with CTRL characters in them, or one called -rf that the rm command had troubles with (until you figure out how to get around that).

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • You should for sure read the edit. Don't allow things like slash '/' or the null character '\0', and it might be better to disallow semi colons as well – Ghost Jul 10 '12 at 16:30
0

This is file-system specific. Check out documentation for FS you expect to work with for list of accepted symbols and directory name limitations. You are already missing lots of punctuation and thousands of non-latin symbols for pretty much every modern FS in existence.

Oleg V. Volkov
  • 21,719
  • 4
  • 44
  • 68