0

I am a bit confused by regex syntax. I need to build two separate Regex patterns that detects whether a filename is legal in windows. One is that matches any word except these chars (illegal characters) -

*"< > : " / \ | ? "

And the second pattern is that matches any word except these words (reserved file names) -

PRN, AUX, CLOCK, NUL, CON, COM, LPT

I found combined version of this pattern that looks like this @"^(?!(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d)(?:\..+)?$)[^\x00-\x1F\xA5\\?*:\"";|\/<>]+(?<![\s.])$", but the key thing is that I need to separate these two.

Could anyone help me? Thank you in advance.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
nobodyMGB
  • 13
  • 1
  • You can also check this: https://stackoverflow.com/questions/3137097/check-if-a-string-is-a-valid-windows-directory-folder-path/16526391 – Cihan Yakar Jul 13 '20 at 14:57
  • 1) `^(?!.*(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d)(?:\..+)?$).*`, 2) `^[^\x00-\x1F\xA5\\?*:"";|/<>]+(?<![\s.])$` – Wiktor Stribiżew Jul 13 '20 at 15:04
  • 1
    Regexes cannot validate filenames because what names are valid [is not determined by the characters in the name](https://stackoverflow.com/questions/1976007/). To validate a filename you have to create the file and see if it throws. – Dour High Arch Jul 13 '20 at 16:28
  • If you're on Windows, the rules for DOS devices and trailing spaces and dots should first be checked by passing the component name to WinAPI `GetFullPathNameW`. If the result is different (e.g. "con" -> "\\.\con", or "spam . . ." -> "spam"), then it's not a valid name. Always let the OS decide this first. – Eryk Sun Jul 13 '20 at 17:05
  • "COM0", "LPT0", and "CLOCK$" are not reserved. "CONIN$" and "CONOUT$" are. Case-insensitive DOS device names may be followed by zero or more spaces up to a colon or dot plus any characters after that (e.g. "nul .txt" or "prn :whatever"). ";" is not reserved, but it's discouraged because it's a delimiter in `PATH` and `PATHEXT`. "¥" (0xA5) is not reserved. The glyph "¥" is just how backslash may be displayed in a Japanese locale. Trailing dots and spaces (only " ") are reserved, but not trailing whitespace. – Eryk Sun Jul 13 '20 at 17:12
  • I'm not a regex expert, but maybe this will work, or someone can at least improve on it: `(?i)^(?!(?:NUL|PRN|AUX|CON|CONIN\$|CONOUT\$|COM[1-9]|LPT[1-9])(?: *\.+.*)?$)[^\x00-\x1F\\\/?*<>\"|:]+(?<![ .])$`. Note the addition of the case-insensitive flag `(?i)` and the optional suffix of a DOS device name `(?: *\.+.*)?` that matches spaces followed by any dot extension; in this case colon is already handled as a reserved character. – Eryk Sun Jul 13 '20 at 17:13
  • 1
    @DourHighArch: opening a reserved DOS device name such "C:/Temp/con :spam" will not necessarily fail. In this case it will open "//./con" if the process is attached to a console. Similarly creating a file named "spam . . ." won't fail, but instead creates "spam". You have to check for a reserved name via `GetFullPathNameW` when name validation is a concern. – Eryk Sun Jul 13 '20 at 17:21

1 Answers1

0

There are actually three things this regex checks for.

You can validate any regex here: https://regex101.com/

This will negate all the occurrences of reserved file names in any position of text:

    (?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d)

but to match exact name you need ^ and & which indicate start and end of the text, so this will work for second group:

#1

    ^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d)$

And for the invalid characters, this will exclude all the given characters:

#2

    ^[^\x00-\x1F\xA5\\?*:\"";|\/<>]+$

but it still allows ., a. and .a. To exclude those three possibilities you need to:

first add (?!(?:\..+)?$) to beginning to exclude names starting with . (that is \.)

then add (?<![.]) to end to exclude names ending with .

#3

    ^(?!(?:\..+)?$).+(?<![.])$

But this will allow many other possibilities with whitespace at the beginning and the end.

At this point you can either trim the text, ignore the invalid names (Windows will trim them) or extend your regex as follows:

adding \x20 excludes a and a. but still allows a and .a

    ^(?!(?:\..+)?$).+(?<![\x20.])$

adding (?!(?:\x20+.+)?$) excludes starting whitespaces:

#3

    ^(?!(?:\x20+.+)?$)(?!(?:\..+)?$).+(?<![\x20.])$
Bizhan
  • 16,157
  • 9
  • 63
  • 101
  • Leading whitespace and dots are allowed. Its only trailing dots and spaces that get trimmed, and in this case it's only exactly " " (0x20), not the general character class of whitespace (`\s`). – Eryk Sun Jul 13 '20 at 17:18
  • @ErykSun you are right, thank you! I took that into my answer – Bizhan Jul 13 '20 at 19:22