1

I need to process some file paths in C# that potentially contain illegal characters, for example:

C:\path\something\output_at_13:26:43.txt

in that path, the :s in the timestamp make the filename invalid, and I want to replace them with another safe character.

I've searched for solutions here on SO, but they seem to be all based around something like:

path = string.Join("_", path.Split(Path.GetInvalidFileNameChars()));

or similar solutions. These solutions however are not good, because they screw up the drive letter, and I obtain an output of:

C_\path\something\output_at_13_26_43.txt

I tried using Path.GetInvalidPathChars() but it still doesn't work, because it doesn't include the : in the illegal characters, so it doesn't replace the ones in the filename.

So, after figuring that out, I tried doing this:

string dir = Path.GetDirectoryName(path);
string file = Path.GetFileName(path);
file = string.Join(replacement, file.Split(Path.GetInvalidFileNameChars()));
dir = string.Join(replacement, dir.Split(Path.GetInvalidPathChars()));

path = Path.Combine(dir, file);

but this is not good either, because the :s in the filename seem to interfere with the Path.GetFilename() logic, and it only returns the last piece after the last :, so I'm losing pieces of the path.

How do I do this "properly" without hacky solutions?

Master_T
  • 7,232
  • 11
  • 72
  • 144
  • https://stackoverflow.com/questions/6198392/check-whether-a-path-is-valid – Dmitry Bychenko Dec 17 '18 at 10:43
  • https://stackoverflow.com/questions/422090/in-c-sharp-check-that-filename-is-possibly-valid-not-that-it-exists – Dmitry Bychenko Dec 17 '18 at 10:43
  • 1
    The drive letter is always going to be 2 chars long, so you should do something like `string driveLetter = path.Substring(0, 2); path = path.Substring(2, path.Length-2);` thus you have "C:" in `driveLetter` and "\path\something\output_at_13:26:43.txt" in `path`. Apply the replacement in `path` and do a `Path.Combine(driveLetter, path);` – ikerbera Dec 17 '18 at 10:45
  • 1
    You beat me to it @ikerbera. You can leave out the second parameter in the second substring call, though. – Markus Deibel Dec 17 '18 at 10:46
  • 1
    @ikerbera - "The drive letter is always going to be 2 chars long" but only if it's present, which it doesn't have to be (e.g. relative paths, UNC paths). – Joe Dec 17 '18 at 10:49
  • @ikerbera: this doesn't work very well, because that will also replace the `\`s in the path (which are not legal for a filename). At the moment I'm using a similar solution, consisting of splitting the string manually on the last `\` and processing the left part as path and the right part as filename. I was wondering if there was a better solution tho. – Master_T Dec 17 '18 at 10:52
  • You should probably do the clean up at the moment you generate a filename, rather than constructing an absolute path and attempting to clean it up afterwards. E.g. `C:\path\something\output_on_21/12/2018.txt` contains no invalid path characters, so can't easily be cleaned up, but probably isn't what was intended. – Joe Dec 17 '18 at 10:53
  • @Joe: unfortunately, the paths come from and external application that I have no control over, so I have to work with what I get. – Master_T Dec 17 '18 at 10:59
  • @Master_T I don't think there's anything that can help you with that, appart from string manipulation. You can use `Path.GetPathRoot` and `Path.IsPathRooted` to make your life easier. @Joe you're right, I didn't take into account if the drive letter is present. An `if (Path.GetPathRoot(path))` before processing the path should work. – ikerbera Dec 17 '18 at 11:03
  • @Master_T, sounds like a strange setup. Is it only the filename part that can contain illegal characters? If so you could use `Path.GetDirectoryName` to split off the directory, and do string manipulation on the filename part. – Joe Dec 17 '18 at 11:56

2 Answers2

2

You can write a simple sanitizer that iterates each character and knows when to expect the colon as a drive separator. This one will catch any combination of letter A-Z followed directly by a ":". It will also detect path separators and not escape them. It will not detect whitespace at the beginning of the input string, so in case your input data might come with them, you will have to trim it first or modify the sanitizer accordingly:

enum ParserState {
    PossibleDriveLetter,
    PossibleDriveLetterSeparator,
    Path
}

static string SanitizeFileName(string input) {
    StringBuilder output = new StringBuilder(input.Length);
    ParserState state = ParserState.PossibleDriveLetter;
    foreach(char current in input) {
        if (((current >= 'a') && (current <= 'z')) || ((current >= 'A') && (current <= 'Z'))) {
            output.Append(current);
            if (state == ParserState.PossibleDriveLetter) {
                state = ParserState.PossibleDriveLetterSeparator;
            }
            else {
                state = ParserState.Path;
            }
        }
        else if ((current == Path.DirectorySeparatorChar) ||
            (current == Path.AltDirectorySeparatorChar) ||
            ((current == ':') && (state == ParserState.PossibleDriveLetterSeparator)) ||
            !Path.GetInvalidFileNameChars().Contains(current)) {

            output.Append(current);
            state = ParserState.Path;
        }
        else {
            output.Append('_');
            state = ParserState.Path;
        }
    }
    return output.ToString();
}

You can try it out here.

Sefe
  • 13,731
  • 5
  • 42
  • 55
  • I think @Master_T was thinking about some method already existing in the .Net environment instead of having to make his own function. This is a very good solution, though. – ikerbera Dec 17 '18 at 11:13
  • 1
    @ikerbera: I don't know any built-in method that does that. I don't consider this solution "hacky" though, so I'd think it's within the scope of the question. – Sefe Dec 17 '18 at 11:17
  • +1 for interesting solution, although a bit overkill for what I need here, so I went with CodeCaster's one. Thanks. – Master_T Dec 17 '18 at 12:03
1

You definitely should make sure that you only receive valid filenames.

If you can't, and you're certain your directory names will be, you could split the path the last backslash (assuming Windows) and reassemble the string:

public static string SanitizePath(string path)
{
    var lastBackslash = path.LastIndexOf('\\');

    var dir = path.Substring(0, lastBackslash);
    var file = path.Substring(lastBackslash, path.Length - lastBackslash);

    foreach (var invalid in Path.GetInvalidFileNameChars())
    {
        file = file.Replace(invalid, '_');
    }

    return dir + file;
}
CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • 1
    "Split on last backslash", or slash: ideally using `Path.DirectorySeparatorChar` and `Path.AltDirectorySeparatorChar` to be OS-independent .NET Core ready. – Joe Dec 17 '18 at 12:00
  • I went with this solution in the end, simple but effective in my case. – Master_T Dec 17 '18 at 12:01
  • Why look for the last backslash instead of using `Path.GetFileName`? – Sefe Dec 17 '18 at 12:31
  • @Sefe because the OP claims _"because the :s in the filename seem to interfere with the Path.GetFilename() logic, and it only returns the last piece after the last :"_. – CodeCaster Dec 17 '18 at 12:32