0

I need to check, if a Windows folder name is potentially valid. The folder does not neccessarily exist already. It can be an absolute path, a relative one, or it can be located on another machine in the network (addressed via UNC).

The following are valid locations:

[[X:]\]                     'Including an empty string.
\\server name\              'Including \\123.123.123.123\

Note, that Windows does accept / instead of \ as well (to verify, enter C:/Users in the file explorer).

The location can be followed by a deep path, and must terminate in a path name with or without an ending slash:

[folder name\]*folder name[\]

None of the characters /\:*?<>"| may appear within server names or folder names.

This can be done by matching the provided text against a regular expression. Thus I created a such:

^                             'At the beginning of the string must appear 
    (                         'either
        \\{2}                 '2 backslashes
        [^/\\\:\*\?\<\>\|]+   'followed by a server name
        (\\|/)                'and a slash,
    |                         'or
        (                     'a sequence of
            (\.{2})           '2 dots
            (\\|/)            'followed by a slash
        )+                    'which may occur at least one time
    |                         'or
        [A-Za-z]              'a drive letter
        \:                    'followed by a colon
        (\\|/)                'and a slash
    |                         'or
        (\\|/)                'simply a slash
    )?                        'or nothing at all;
(                             'followed by a sequence of
    [^/\\\:\*\?\<\>\|]+       'a folder name
    (\\|/)                    'followed by a slash
)*                            'which may occur multiple times
[^/\\\:\*\?\<\>\|]+           'The last folder needs no final slash
(\\|/)?                       'but may have one.

The following function is called:

Private Function IsDirValid(sFile As String) As Boolean
    Dim sPattern As String = "^[^/\\\:\*\?\<\>\|]+(\\|/)" &
                                "|((\.{2})(\\|/))+" &
                                "|[A-Za-z]\:(\\|/)" &
                                "|(\\|/)" &
                             ")?" &
                             "([^/\\\:\*\?\<\>\|]+(\\|/))*" &
                             "[^/\\\:\*\?\<\>\|]+(\\|/)?"
    Dim oMatch As Match = Regex.Match(sFile, sPattern)

    'Debug.Print("""" & sFile & """ returns """ & oMatch.Value & """")

    Return (sFile = oMatch.Value)
End Function

which seems to work not too bad. These expressions all are recognized as valid:

path name[/]
path name/path name/path name[/]
/path name[/]
/path name/path name/path name[/]
../../path name[/]
../../path name/path name/path name[/]
c:/path name[/]
c:/path name/path name/path name/file name[/]
\\server name/path name[/]
\\server name\path name\path name\path name[/]

(Did I miss some?)

My only problem is now, that each path name does allow leading and trailing whitespace. This is not allowed in path names. However, "in-name" blanks are allowed.

Of course, I could replace the 3 occurrences of

[^/\\\:\*\?\<\>\|]+

by

[^/\\\:\*\?\<\>\|\ ][^/\\\:\*\?\<\>\|]*[^/\\\:\*\?\<\>\|\ ]

which would solve the whitespace problem (tested), but introduces another one: the names need to be at least 2 characters long now (unacceptable of course). And it's becoming ugly.

Alas, in the regex quick reference guide I was not able to find a suitable quantifier for my problem.

Thence: is there a more concise way?

  • See [Validate folder name in C#](https://stackoverflow.com/questions/12688985/validate-folder-name-in-c-sharp), did you try [`Path.GetInvalidPathChars(path)`](https://msdn.microsoft.com/en-us/library/system.io.path.getinvalidpathchars(v=vs.110).aspx)? And for the UNC path, [What is the correct way to check if a path is an UNC path or a local path](https://stackoverflow.com/questions/520753). – Wiktor Stribiżew Jun 23 '17 at 06:28
  • @WiktorStribiżew : thank you. Does your suggestion to use `GetInvalidPathChars` imply that `[^/\\\:\*\?\<\>\|\ ][^/\\\:\*\?\<\>\|]+[^/\\\:\*\?\<\>\|\ ]` is not correct? If you found a flaw justifying usage of said function, just let me know please. Of course, it's not a really elegant beast. I was hoping for a quantifier of some sort stating "may not start/end with whitespace". Also a kind of macro would help. –  Jun 23 '17 at 07:04

1 Answers1

0

One possible solution, albeit not too beautiful, is the substitution of

[^/\\\:\*\?\<\>\|]

with this construct:

(([^/\\\:\*\?\<\>\|\ ][^/\\\:\*\?\<\>\|]*[^/\\\:\*\?\<\>\|\ ])|[^/\\\:\*\?\<\>\|\ ])

which reads:

(1) as first character allow unforbidden characters but not whitespace, then allow multiple unforbidden characters (also whitespace), and as last character allow unforbidden characters but not whitespace;

or

(2) as only character allow unforbidden characters but not whitespace.

The regex in the OP was modified quite a bit, as can be inferred from below code snippet. If required, I can write it down in more detail.

To test you could use this function (in production code you need to take care that the regex is constructed just once):

'Test if the provided folder name is potentially valid (it does not need to
'exist already).
Private Function IsDirValid(sDir As String) As Boolean
    'There are two sets of disallowed characters in server, share and folder 
    'names. Leading And trailing whitespace is not allowed, whitespace within
    'a name is ok. \"" is escaping a single doublequote mark for syntactical
    'reasons in VB. SLASH is defined just for readability. {DRIVE} and 
    '{LOCALDIR} are used twice each, so they are encoded as well.
    Const ALLOWBLANKDOT As String = "[^\x00-\x1F/\\\:\*\?\<\>\""\|]"
    Const FORBIDBLANKDOT As String = "[^\x00-\x1F/\\\:\*\?\<\>\""\|\ \.]"
    Const SLASH As String = "(\\|/)"
    Const DRIVE As String = "((\\{2}\?\\)?[A-Za-z]\:)"
    Const LOCALDIR As String = "({NAME}|\.{1,2}({SLASH}(\.{2}))*)"

    'Qualify zero-length strings as False. Pathes may be only 260 characters
    'long, including a terminating NUL character and 12 characters to
    'specify the short 8.3 DOS name. Because this limit includes also a file
    'name not evaluated here, at least two characters (for the slash and at
    'least one file name character) are subtracted also, for a maximum 
    'path length of 245 characters.
    If sDir.Length = 0 OrElse sDir.Length > 245 Then Return False

    'The text identifying a single path level is lengthy and appears multiple
    'times. For clarity, it is presented with {NAME} in a first step, which 
    'is substituted afterwards by an abstraction of which characters can be 
    'used depending on character position. Eventually, the abstractions are 
    'substitued by the proper regex ensuring that names can contain in-name 
    'spaces and dots, but neither leading or trailing spaces nor dots. 
    '{SLASH} is just used for enhanced readability.
    Dim sPattern As String = "^(\\{2}(\?\\UNC\\)?{NAME}{SLASH}{NAME}" &
                                 "|(({DRIVE}{SLASH}?{LOCALDIR}?)" &
                                  "|({DRIVE}?{SLASH}?{LOCALDIR}))" &
                                 "|{NAME}" &
                             ")?" &
                             "({SLASH}{NAME})*" &
                             "{SLASH}?"
    sPattern = Replace(sPattern, "{DRIVE}", DRIVE)
    sPattern = Replace(sPattern, "{LOCALDIR}", LOCALDIR)
    sPattern = Replace(sPattern, "{NAME}",
        "(({FORBIDBLANKDOT}{ALLOWBLANKDOT}*{FORBIDBLANKDOT})" &
         "|{FORBIDBLANKDOT})")
    sPattern = Replace(sPattern, "{ALLOWBLANKDOT}", ALLOWBLANKDOT)
    sPattern = Replace(sPattern, "{FORBIDBLANKDOT}", FORBIDBLANKDOT)
    sPattern = Replace(sPattern, "{SLASH}", SLASH)

    Dim oMatch As Match = Regex.Match(sDir, sPattern)

    Debug.Print("""" & sDir & """ returns """ & oMatch.Value & """")

    Return (sDir = oMatch.Value)
End Function

This does catch all of the following, as per Microsoft specification for path names:

[/]path name[/] 
[/]path name/path name/path name[/]
./path name[/]
./../path name/path name/path name[/]
../path name[/]
../../path name/path name/path name[/]
c:
c:path name[/]
c:path name/path name/path name/file name[/]
c:/path name[/]
c:/path name/path name/path name/file name[/]
c:.[/]
c:./path name[/]
c:./path name/path name/path name/file name[/]
c:..[/]
c:/../path name[/]
c:/../path name/path name/path name/file name[/]
\\?\c:path name[/]
\\server name/share name[/]
\\server name\share name\path name\path name[/]
\\?\UNC\server name\share name[/]

If someone comes up with a more elegant piece of work, please let me know.