How to remove illegal characters from path and filenames?

Question

I need a robust and simple way to remove illegal path and file characters from a simple string. I've used the below code but it doesn't seem to do anything, what am I missing?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}

Trim removes characters from the beginning and end of strings. However, you probably should ask why the data is invalid, and rather than try and sanitize/fix the data, reject the data. — user7116, Sep 28 '08 at 15:54
Unix style names are not valid on Windows and i don't want to deal with 8.3 shortnames. — Gary Willoughby, Oct 16 '09 at 12:04
`GetInvalidFileNameChars()` will strip things like : \ etc from folder paths. — CAD bloke, May 20 '16 at 03:18
`Path.GetInvalidPathChars()` doesn't seem to strip `*` or `?` — CAD bloke, May 20 '16 at 03:24
I tested five answers from this question (timed loop of 100,000) and the following method is the fastest. The regular expression took 2nd place, and was 25% slower : public string GetSafeFilename(string filename) { return string.Join("_", filename.Split(Path.GetInvalidFileNameChars())); } — Brain2000, Jul 15 '16 at 15:20
I added a new fast alternative, and some benchmarks in [this answer](https://stackoverflow.com/a/64121323/1042409) — c-chavez, Sep 29 '20 at 14:07

score 584 · Answer 1 · edited Jul 16 '19 at 21:58

584

The original question asked to "remove illegal characters":

public string RemoveInvalidChars(string filename)
{
    return string.Concat(filename.Split(Path.GetInvalidFileNameChars()));
}

You may instead want to replace them:

public string ReplaceInvalidChars(string filename)
{
    return string.Join("_", filename.Split(Path.GetInvalidFileNameChars()));    
}

This answer was on another thread by Ceres, I really like it neat and simple.

edited Jul 16 '19 at 21:58

idbrii

10,975
5
66
107

answered Apr 20 '14 at 13:06

Shehab Fawzy

7,148
1
25
18

14

To precisely answer the OP's question, you would need to use "" instead of "_", but your answer probably applies to more of us in practice. I think replacing illegal characters with some legal one is more commonly done. – B H Jan 08 '16 at 20:27
60

I tested five methods from this question (timed loop of 100,000) and this method is the fastest one. The regular expression took 2nd place, and was 25% slower than this method. – Brain2000 Jul 15 '16 at 15:19
12

To address @BH 's comment, one can simply use string.Concat(name.Split(Path.GetInvalidFileNameChars())) – Michael Sutton Jun 07 '17 at 14:06
Suprisingly the Split/Join code is about as fast as a foreach loop, it has the same performance. – Damian Vogel Oct 19 '21 at 20:12

Matthew Scharley · Accepted Answer · 2010-12-13T23:34:58.137

551

Try something like this instead;

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());

foreach (char c in invalid)
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.

Edit: Or a potentially 'better' solution, using Regex's.

string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Still, the question begs to be asked, why you're doing this in the first place.

edited Dec 13 '10 at 23:34

answered Sep 28 '08 at 16:03

Matthew Scharley

127,823
52
194
222

1

I don't know if I should +1 your answer for having such an ill-performing solution that will push the user away from that path, or if I should +1 your answer for it actually answering his question! :) – user7116 Sep 28 '08 at 16:05
@Michael Stum: they get 'compiled' and should be some sort of state machine, but it would be naive to assume they are guaranteed to be any more efficient under the hood than a loop. – user7116 Sep 28 '08 at 16:10
On something the length of a path, it probably wouldn't make that much of a difference. On a longer string, I imagine the regex would be faster though. – Matthew Scharley Sep 28 '08 at 16:15
I'd stick to the non-regex solution: it's likely to be more efficient most of the time. If using the regex solution, change string.Format() to just "["+"...". If you're going to treat `illegal` as a file name without path after replacing special chars then you'd only need Path.InvalidFileNameChars(). – Rory Aug 19 '10 at 17:58
51

It's not necessary to append the two lists together. The illegal file name char list contains the illegal path char list and has a few more. Here are lists of both lists cast to int: 34,60,62,124,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,58,42,63,92,47 34,60,62,124,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 – Sarel Botha Apr 11 '11 at 18:12
12

@sjbotha this may be true on Windows and Microsoft's implementation of .NET I'm not willing to make the same assumption for say mono running Linux. – Matthew Scharley Apr 17 '11 at 01:24
7

Regarding the first solution. Shouldn't a StringBuilder be more efficient than the string assignments? – epignosisx Dec 30 '11 at 15:53
1

If the string contains Chinese characters, the solution could fail. – PerlDev Jan 02 '12 at 05:13
@PerlDev: Have you actually tested that? `char`acters should be multi-byte compatible (`sizeof(char) == 2`), so it shouldn't be an issue. The regex solution should be fine also. – Matthew Scharley Jan 17 '12 at 08:47
3

What's the problem with sanitization, Bob Tables? – cregox Nov 08 '13 at 21:02
1

Correct me if I'm wrong, but calling both `Path.GetInvalidFileNameChars()` and `Path.GetInvalidPathChars()` is superfluous. `Path.GetInvalidFileNameChars()` alone should be sufficient. – Joey Adams Nov 13 '13 at 18:34
2

@JoeyAdams: see my reply to Sarel Botha. In short, one is a superset of the other on Windows. Personally, I'm not willing to make the same bet cross platform and C# and .NET in general is getting a wider and wider audience via Mono all the time. – Matthew Scharley Nov 15 '13 at 08:18
7

For what it's worth, @MatthewScharley, the Mono implementation of GetInvalidPathChars() returns only 0x00 and GetInvalidFileNameChars() returns only 0x00 and '/' when running on non-Windows platforms. On Windows, the lists of invalid characters is much longer, and GetInvalidPathChars() is entirely duplicated inside GetInvalidFileNameChars(). This isn't going to change in the forseeable future, so all you're really doing is doubling the amount of time this function takes to run because you're worried that the definition of a valid path will change sometime soon. Which it won't. – Warren Rumak Jan 27 '14 at 19:09
And let's be super-clear about this: This part of the Mono source code hasn't changed in EIGHT YEARS except for a minor perf improvement in 2007. – Warren Rumak Jan 27 '14 at 19:11
4

@Warren: Feel free to dedupe the resultant string if you really are worried, but lets be perfectly honest here: The difference between 20 and 40 iterations against a string the length of your average path (lets say 100 characters to be generous) will make exactly *no* difference to the runtime of your function. For all *practical* purposes, there's no need to worry about it. On the other hand, these two functions do serve different purposes and (in my mind at least), it would be perfectly reasonable for one function to not return a superset of the other for some given file system. – Matthew Scharley Jan 29 '14 at 05:40
2

How can doing double the work (whether it's deduplicating the array, or running through almost precisely the same array values twice) take "exactly no difference"? You know as well as I do that this is incorrect, so -don't- -say- -it-. We're trying to be an educational resource at Stackoverflow, not a place for rhetorical flourishes prompted by being told you're wrong. Let's be clear: What you're recommending here is effectively the same as the old Daily WTF canard about providing your own definition of TRUE and FALSE because you can't trust the compiler or libraries to always get it right. – Warren Rumak Jan 29 '14 at 16:43
4

GetInvalidFileNameChars() is always -- ALWAYS, you hear me -- going to include everything in GetInvalidPathChars() because it isn't possible for a file to have a character in that isn't valid in a path name. No file system allows this today, no file system ever will. And anyways, Microsoft's own documentation for these functions is very clear in stating that you should not expect the list of characters to be guaranteed as accurate because file systems might support something different anyways. – Warren Rumak Jan 29 '14 at 16:52
3

I'd probably side with Matthew here and just say that assumption is the mother of all mess ups. You are talking about optimising code which probably doesn't need optimizing over potential correctness. I'd take the correctness over the premature optimisation any day – Charleh Mar 15 '14 at 17:50
15

@Charleh this discussion is so unnecessary... code should always be optimized and there is no risk of this to be incorrect. A filename is a part of the path, too. So it is just illogical that `GetInvalidPathChars()` could contain characters that `GetInvalidFileNameChars()` wouldn't. You are not taking correctness over "premature" optimisation. You are simply using bad code. – Stefan Fabian Aug 09 '14 at 11:54
3

Personally i would prefer this way: `var invalid = Path.GetInvalidFileNameChars().Union(Path.GetInvalidPathChars()); foreach(char c in invalid) illegal = illegal.Replace(c.ToString(), "_");` – Tim Schmelter Sep 09 '15 at 12:20
3

I'm not sure why you guys are so nosy about why he wants to use it. There are various legit scenarios where this would be useful. Our app for example outputs xlsx files to email as reports and if we don't validate it on entry, you won't know until the scheduled time of creation of the report that the filename was invalid. We've had issues where in the past someone accidently entered a less-than in the filename and saved it. Plus some of our clients run linux and some run windows so the allowed files aren't the same. – John Lord Nov 30 '18 at 17:51
1

@JohnLord another common use case is dealing with filenames coming in from outside emails. You cannot control the file name being sent to you. You can, of course, throw away the original and replace it with something of your own devising, but there are cases where you want to retain as much of the original as possible for AI purposes. – Byron Jun 03 '20 at 17:00

Michael Minton · Answer 3 · 2015-09-24T02:47:17.330

219

I use Linq to clean up filenames. You can easily extend this to check for valid paths as well.

private static string CleanFileName(string fileName)
{
    return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}

Update

Some comments indicate this method is not working for them so I've included a link to a DotNetFiddle snippet so you may validate the method.

https://dotnetfiddle.net/nw1SWY

edited Sep 24 '15 at 02:47

answered Sep 12 '11 at 20:38

Michael Minton

4,447
2
19
31

4

This did not work for me. The method is not returning the clean string. It is returning the passed filename as it is. – Karan Jul 17 '13 at 06:29
What @Karan said, this does not work, the original string comes back. – Jon Mar 20 '14 at 15:26
You can actually do this with Linq like this though: `var invalid = new HashSet(Path.GetInvalidPathChars()); return new string(originalString.Where(s => !invalid.Contains(s)).ToArray())`. Performance probably isn't great but that probably doesn't matter. – Casey Jul 09 '15 at 14:12
2

@Karan or Jon What input are you sending this function? See my edit for verification of this method. – Michael Minton Sep 24 '15 at 02:45
3

It's easy - guys were passing strings with valid chars. Upvoted for cool Aggregate solution. – Nickmaovich Jan 20 '16 at 13:10
Very good solution but only cleans up filename (as stated) but not the actual path as it is considering "\" as an illegal character and if you have something like "\\MyServer\e$\demo\Output\Test\1111_joe_soap.pdf", it returns "MyServere$demoOutputTest1111_joe_soap.pdf" – Thierry Mar 16 '17 at 10:45

score 93 · Answer 4 · edited Aug 22 '16 at 11:00

93

You can remove illegal chars using Linq like this:

var invalidChars = Path.GetInvalidFileNameChars();

var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();

EDIT
This is how it looks with the required edit mentioned in the comments:

var invalidChars = Path.GetInvalidFileNameChars();

string invalidCharsRemoved = new string(stringWithInvalidChars
  .Where(x => !invalidChars.Contains(x))
  .ToArray());

edited Aug 22 '16 at 11:00

Jan Willem B

3,787
1
25
39

answered Nov 24 '10 at 19:41

Gregor Slavec

4,814
1
26
24

1

I like this way : you keep only the allowed chars in the string (which is nothing else than a char array). – Dude Pascalou Jul 04 '12 at 09:36
6

I know that this is an old question, but this is an awesome answer. However, I wanted to add that in c# you cannot cast from char[] to string either implicitly or explicitly (crazy, I know) so you'll need to drop it into a string constructor. – JNYRanger Oct 21 '14 at 18:52
1

I haven't confirmed this, but I expect Path.GetInvalidPathChars() to be a superset of GetInvalidFileNameChars() and to cover both filenames and paths, so I would probably use that instead. – angularsen Jan 09 '15 at 22:11
3

@anjdreas actually Path.GetInvalidPathChars() seems to be a subset of Path.GetInvalidFileNameChars(), not the other way round. Path.GetInvalidPathChars() will not return '?', for example. – Rafael Costa Dec 30 '15 at 10:21
1

This is a good answer. I use both the filename list and the filepath list: ____________________________ string cleanData = new string(data.Where(x => !Path.GetInvalidFileNameChars().Contains(x) && !Path.GetInvalidPathChars().Contains(x)).ToArray()); – goamn Nov 30 '17 at 05:28
You can also do var invalidChars = new HashSet(Path.GetInvalidFileNameChars()) and make it O(n) instead of O(n^2). No reason why not. – Cesar Sep 24 '20 at 14:34

Lily Finley · Answer 5 · 2020-02-24T20:12:37.010

43

For file names:

var cleanFileName = string.Join("", fileName.Split(Path.GetInvalidFileNameChars()));

For full paths:

var cleanPath = string.Join("", path.Split(Path.GetInvalidPathChars()));

Note that if you intend to use this as a security feature, a more robust approach would be to expand all paths and then verify that the user supplied path is indeed a child of a directory the user should have access to.

edited Feb 24 '20 at 20:12

answered Feb 11 '14 at 02:36

Lily Finley

2,847
1
16
11

René · Answer 6 · 2011-11-16T13:29:43.563

31

These are all great solutions, but they all rely on Path.GetInvalidFileNameChars, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).

It's not any better with Path.GetInvalidPathChars method. It contains the exact same remark.

edited Nov 16 '11 at 13:29

answered Nov 16 '11 at 13:22

René

9,880
4
43
49

14

Then what is the point of Path.GetInvalidFileNameChars? I would expect it to return exactly the invalid characters for the current system, relying on .NET to know which filesystem I'm running on and presenting me the fitting invalid chars. If this is not the case and it just returns hardcoded characters, which are not reliable in the first place, this method should be removed since it has zero value. – Jan Jan 18 '14 at 18:08
1

I know this is a old comment but, @Jan you could want to write on another filesystem, maybe this is why there is a warning. – fantastik78 Jul 07 '15 at 13:59
4

@fantastik78 good point, but in this case I would want to have an additional enum argument to specify my remote FS. If this is too much maintenance effort (which is most likely case), this whole method is still a bad idea, because it gives you the wrong impression of safety. – Jan Sep 03 '15 at 10:33
1

@Jan I totally agree with you, I was just arguing about the warning. – fantastik78 Sep 03 '15 at 14:39
Interestingly this is a sort of "blacklisting" invalid chars. Would it not be better to "whitelist" only the known valid chars here?! Reminds me of the stupid "virusscanner" idea instead of whitelisting allowed apps.... – Bernhard Jul 10 '18 at 08:48
pay attention to the fact about filenames in the warning. It's actually telling you that it's not validating filenames themselves, just illegal characters. You could still have an illegal filename that is a reserved word. Also how would you whitelist an app? I would just make my virus have your filename and signature. – John Lord Nov 29 '18 at 17:04

score 21 · Answer 7 · edited Apr 17 '16 at 23:13

21

The best way to remove illegal character from user input is to replace illegal character using Regex class, create method in code behind or also it validate at client side using RegularExpression control.

public string RemoveSpecialCharacters(string str)
{
    return Regex.Replace(str, "[^a-zA-Z0-9_]+", "_", RegexOptions.Compiled);
}

OR

<asp:RegularExpressionValidator ID="regxFolderName" 
                                runat="server" 
                                ErrorMessage="Enter folder name with  a-z A-Z0-9_" 
                                ControlToValidate="txtFolderName" 
                                Display="Dynamic" 
                                ValidationExpression="^[a-zA-Z0-9_]*$" 
                                ForeColor="Red">

edited Apr 17 '16 at 23:13

Koopakiller

2,838
3
32
47

answered Sep 28 '13 at 06:35

anomepani

1,796
2
25
34

6

IMHO this solution is much better than others Instead of searching for all invalid chars just define which are valid. – igorushi Sep 29 '15 at 07:55
2

For [POSIX "Fully portable filenames"](https://en.wikipedia.org/wiki/Filename), use `"[^a-zA-Z0-9_.-]+"` – CrazyTim Jul 22 '21 at 23:20

score 18 · Answer 8 · edited May 23 '17 at 12:26

18

For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:

StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).

edited May 23 '17 at 12:26

Community

1
1

answered Sep 28 '08 at 15:56

user7116

63,008
17
141
172

I especially agree with the second advice. – OregonGhost Sep 28 '08 at 15:59
4

I would normally agree with the second, but I have a program which generates a filename and which may contain illegal characters in some situations. Since *my program* is generating the illegal filenames, I think it's appropriate to remove/replace those characters. (Just pointing out a valid use-case) – JDB May 09 '13 at 15:48

Jeff Yates · Answer 9 · 2012-02-13T15:01:48.180

15

I use regular expressions to achieve this. First, I dynamically build the regex.

string regex = string.Format(
                   "[{0}]",
                   Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.

edited Feb 13 '12 at 15:01

answered Sep 28 '08 at 18:45

Jeff Yates

61,417
20
137
189

Strange, it has been working for me. I'll double-check it when I get chance. Can you be more specific and explain what exactly isn't working for you? – Jeff Yates Feb 08 '10 at 15:56
1

It won't work (properly at the very least) because you aren't escaping the path characters properly, and some of them have a special meaning. Refer to my answer for how to do that. – Matthew Scharley Apr 08 '10 at 21:39
@Jeff: Your version is still better than Matthew's, if you slightly modify it. Refer to my answer on how. – Jan Feb 13 '12 at 08:28
3

I would also add some other invalid file name patterns that can be found on [MSDN](http://msdn.microsoft.com/en-us/library/aa365247.aspx#namespaces) and extend your solution to the following regex: `new Regex(String.Format("^(CON|PRN|AUX|NUL|CLOCK\$|COM[1-9]|LPT[1-9])(?=\..|$)|(^(\.+|\s+)$)|((\.+|\s+)$)|([{0}])", Regex.Escape(new String(Path.GetInvalidFileNameChars()))), RegexOptions.Compiled | RegexOptions.Singleline | RegexOptions.CultureInvariant);` – yar_shukan Sep 10 '14 at 14:46
Small syntax improvement for @yar_shukan comment: Add `@` before string expression, if you faced with error "Unrecognized escape sequence", i.e. `String.Format(@"^CON| ... )"` – hotenov Sep 26 '20 at 09:29

score 13 · Answer 10 · answered Feb 15 '11 at 14:21

I absolutely prefer the idea of Jeff Yates. It will work perfectly, if you slightly modify it:

string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

The improvement is just to escape the automaticially generated regex.

score 12 · Answer 11 · answered Oct 19 '10 at 16:33

Here's a code snippet that should help for .NET 3 and higher.

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}

score 8 · Answer 12 · answered Jun 19 '12 at 12:16

8

Most solutions above combine illegal chars for both path and filename which is wrong (even when both calls currently return the same set of chars). I would first split the path+filename in path and filename, then apply the appropriate set to either if them and then combine the two again.

wvd_vegt

answered Jun 19 '12 at 12:16

wvd_vegt

326
2
5

+1: Very true. Today, working in .NET 4.0, the regex solution from the top answer nuked all backslashes in a full path. So I made a regex for the dir path and a regex for just the filename, cleaned separately and recombined – dario_ramos May 22 '13 at 21:03
That might be true but this doesn't answer the question. I'm not sure a vague 'I'd do it like this' is terribly helpful compared to some of the complete solutions already in here (see for example Lilly's answer, below) – Ian Grainger May 12 '16 at 11:20

score 6 · Answer 13 · answered Oct 01 '14 at 18:40

If you remove or replace with a single character the invalid characters, you can have collisions:

<abc -> abc
>abc -> abc

Here is a simple method to avoid this:

public static string ReplaceInvalidFileNameChars(string s)
{
    char[] invalidFileNameChars = System.IO.Path.GetInvalidFileNameChars();
    foreach (char c in invalidFileNameChars)
        s = s.Replace(c.ToString(), "[" + Array.IndexOf(invalidFileNameChars, c) + "]");
    return s;
}

The result:

 <abc -> [1]abc
 >abc -> [2]abc

score 6 · Answer 14 · answered Feb 09 '15 at 21:19

6

This seems to be O(n) and does not spend too much memory on strings:

    private static readonly HashSet<char> invalidFileNameChars = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string RemoveInvalidFileNameChars(string name)
    {
        if (!name.Any(c => invalidFileNameChars.Contains(c))) {
            return name;
        }

        return new string(name.Where(c => !invalidFileNameChars.Contains(c)).ToArray());
    }

answered Feb 09 '15 at 21:19

Alexey F

1,763
14
19

1

I don't think it's O(n) when you use the 'Any' function. – II ARROWS Aug 30 '16 at 10:42
@IIARROWS and what is it in your opinion? – Alexey F Aug 30 '16 at 12:32
I don't know, it just didn't felt like that when I wrote my comment... now that I tried to calculate it, looks like you're right. – II ARROWS Aug 30 '16 at 19:48
I selected this one because of your performance consideration. Thanks. – Berend Engelbrecht Oct 16 '19 at 09:54

score 5 · Answer 15 · answered Jul 25 '18 at 12:14

File name can not contain characters from Path.GetInvalidPathChars(), + and # symbols, and other specific names. We combined all checks into one class:

public static class FileNameExtensions
{
    private static readonly Lazy<string[]> InvalidFileNameChars =
        new Lazy<string[]>(() => Path.GetInvalidPathChars()
            .Union(Path.GetInvalidFileNameChars()
            .Union(new[] { '+', '#' })).Select(c => c.ToString(CultureInfo.InvariantCulture)).ToArray());


    private static readonly HashSet<string> ProhibitedNames = new HashSet<string>
    {
        @"aux",
        @"con",
        @"clock$",
        @"nul",
        @"prn",

        @"com1",
        @"com2",
        @"com3",
        @"com4",
        @"com5",
        @"com6",
        @"com7",
        @"com8",
        @"com9",

        @"lpt1",
        @"lpt2",
        @"lpt3",
        @"lpt4",
        @"lpt5",
        @"lpt6",
        @"lpt7",
        @"lpt8",
        @"lpt9"
    };

    public static bool IsValidFileName(string fileName)
    {
        return !string.IsNullOrWhiteSpace(fileName)
            && fileName.All(o => !IsInvalidFileNameChar(o))
            && !IsProhibitedName(fileName);
    }

    public static bool IsProhibitedName(string fileName)
    {
        return ProhibitedNames.Contains(fileName.ToLower(CultureInfo.InvariantCulture));
    }

    private static string ReplaceInvalidFileNameSymbols([CanBeNull] this string value, string replacementValue)
    {
        if (value == null)
        {
            return null;
        }

        return InvalidFileNameChars.Value.Aggregate(new StringBuilder(value),
            (sb, currentChar) => sb.Replace(currentChar, replacementValue)).ToString();
    }

    public static bool IsInvalidFileNameChar(char value)
    {
        return InvalidFileNameChars.Value.Contains(value.ToString(CultureInfo.InvariantCulture));
    }

    public static string GetValidFileName([NotNull] this string value)
    {
        return GetValidFileName(value, @"_");
    }

    public static string GetValidFileName([NotNull] this string value, string replacementValue)
    {
        if (string.IsNullOrWhiteSpace(value))
        {
            throw new ArgumentException(@"value should be non empty", nameof(value));
        }

        if (IsProhibitedName(value))
        {
            return (string.IsNullOrWhiteSpace(replacementValue) ? @"_" : replacementValue) + value; 
        }

        return ReplaceInvalidFileNameSymbols(value, replacementValue);
    }

    public static string GetFileNameError(string fileName)
    {
        if (string.IsNullOrWhiteSpace(fileName))
        {
            return CommonResources.SelectReportNameError;
        }

        if (IsProhibitedName(fileName))
        {
            return CommonResources.FileNameIsProhibited;
        }

        var invalidChars = fileName.Where(IsInvalidFileNameChar).Distinct().ToArray();

        if(invalidChars.Length > 0)
        {
            return string.Format(CultureInfo.CurrentCulture,
                invalidChars.Length == 1 ? CommonResources.InvalidCharacter : CommonResources.InvalidCharacters,
                StringExtensions.JoinQuoted(@",", @"'", invalidChars.Select(c => c.ToString(CultureInfo.CurrentCulture))));
        }

        return string.Empty;
    }
}

Method GetValidFileName replaces all incorrect data to _.

score 5 · Answer 16 · answered Mar 12 '09 at 16:14

5

Throw an exception.

if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
            {
                throw new ArgumentException();
            }

answered Mar 12 '09 at 16:14

mirezus

13,892
11
37
42

I don't think throwing an exception is valuable here as the question states about removing the offending characters, not simply throwing an exception. – PHenry Nov 20 '20 at 01:30

Simant · Answer 17 · 2021-02-06T20:02:51.530

If you have to use the method in many places in a project, you could also make an extension method and call it anywhere in the project for strings.

 public static class StringExtension
    {
        public static string RemoveInvalidChars(this string originalString)
        {            
            string finalString=string.Empty;
            if (!string.IsNullOrEmpty(originalString))
            {
                return string.Concat(originalString.Split(Path.GetInvalidFileNameChars()));
            }
            return finalString;            
        }
    }

You can call the above extension method as:

string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string afterIllegalChars = illegal.RemoveInvalidChars();

Because every string is a path. Or why would it make sense to extend `string` for just one special case? — Andreas, Feb 05 '21 at 14:54

score 4 · Answer 18 · answered Dec 07 '13 at 13:21

I wrote this monster for fun, it lets you roundtrip:

public static class FileUtility
{
    private const char PrefixChar = '%';
    private static readonly int MaxLength;
    private static readonly Dictionary<char,char[]> Illegals;
    static FileUtility()
    {
        List<char> illegal = new List<char> { PrefixChar };
        illegal.AddRange(Path.GetInvalidFileNameChars());
        MaxLength = illegal.Select(x => ((int)x).ToString().Length).Max();
        Illegals = illegal.ToDictionary(x => x, x => ((int)x).ToString("D" + MaxLength).ToCharArray());
    }

    public static string FilenameEncode(string s)
    {
        var builder = new StringBuilder();
        char[] replacement;
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if(Illegals.TryGetValue(c,out replacement))
                {
                    builder.Append(PrefixChar);
                    builder.Append(replacement);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static string FilenameDecode(string s)
    {
        var builder = new StringBuilder();
        char[] buffer = new char[MaxLength];
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if (c == PrefixChar)
                {
                    reader.Read(buffer, 0, MaxLength);
                    var encoded =(char) ParseCharArray(buffer);
                    builder.Append(encoded);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static int ParseCharArray(char[] buffer)
    {
        int result = 0;
        foreach (char t in buffer)
        {
            int digit = t - '0';
            if ((digit < 0) || (digit > 9))
            {
                throw new ArgumentException("Input string was not in the correct format");
            }
            result *= 10;
            result += digit;
        }
        return result;
    }
}

I like this because it avoids having two different strings creating the same resulting path. — Kim, Jan 29 '14 at 16:25

score 3 · Answer 19 · answered Sep 28 '08 at 16:07

3

I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters. See these links: http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

Also, do a search for "regular expression editor"s, they help a lot. There are some around which even output the code in c# for you.

answered Sep 28 '08 at 16:07

Sandor Davidhazi

888
1
9
20

Given that .net is a framework that is intended to allow programs to run on multiple platforms (e.g. Linux/Unix as well as Windows), I feel Path.GetInvalidFileNameChars() is best since it will contain the knowledge of what is or isn't valid on the filesystem your program is being run on. Even if your program will never run on Linux (maybe it's full of WPF code), there's always the chance some new Windows filesystem will come along in the future and have different valid/invalid chars. Rolling your own with regex is reinventing the wheel, and shifting a platform issue into your own code. – Daniel Scott Oct 03 '18 at 23:54
I agree with your advice on online regex editors/testers though. I find them invaluable (since regexes are tricky things, and full of subtlety that can trip you up easily, giving you a regex that behaves in some wildly unexpected way with edge cases). My favourite is https://regex101.com (I like how it breaks the regex down and shows you clearly what it expects to match). I also quite like https://www.debuggex.com as it's got a compact visual representation of match groups and character classes and whatnot. – Daniel Scott Oct 04 '18 at 00:05

Daniel Scott · Answer 20 · 2017-09-08T00:20:26.827

Scanning over the answers here, they all** seem to involve using a char array of invalid filename characters.

Granted, this may be micro-optimising - but for the benefit of anyone who might be looking to check a large number of values for being valid filenames, it's worth noting that building a hashset of invalid chars will bring about notably better performance.

I have been very surprised (shocked) in the past just how quickly a hashset (or dictionary) outperforms iterating over a list. With strings, it's a ridiculously low number (about 5-7 items from memory). With most other simple data (object references, numbers etc) the magic crossover seems to be around 20 items.

There are 40 invalid characters in the Path.InvalidFileNameChars "list". Did a search today and there's quite a good benchmark here on StackOverflow that shows the hashset will take a little over half the time of an array/list for 40 items: https://stackoverflow.com/a/10762995/949129

Here's the helper class I use for sanitising paths. I forget now why I had the fancy replacement option in it, but it's there as a cute bonus.

Additional bonus method "IsValidLocalPath" too :)

(** those which don't use regular expressions)

public static class PathExtensions
{
    private static HashSet<char> _invalidFilenameChars;
    private static HashSet<char> InvalidFilenameChars
    {
        get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
    }


    /// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the 
    /// specified replacement character.</summary>
    /// <param name="text">Text to make into a valid filename. The same string is returned if 
    /// it is valid already.</param>
    /// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
    /// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
    /// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
    public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
    {
        StringBuilder sb = new StringBuilder(text.Length);
        HashSet<char> invalids = InvalidFilenameChars;
        bool changed = false;

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (invalids.Contains(c))
            {
                changed = true;
                char repl = replacement ?? '\0';
                if (fancyReplacements)
                {
                    if (c == '"') repl = '”'; // U+201D right double quotation mark
                    else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                    else if (c == '/') repl = '⁄'; // U+2044 fraction slash
                }
                if (repl != '\0')
                    sb.Append(repl);
            }
            else
                sb.Append(c);
        }

        if (sb.Length == 0)
            return "_";

        return changed ? sb.ToString() : text;
    }


    /// <summary>
    /// Returns TRUE if the specified path is a valid, local filesystem path.
    /// </summary>
    /// <param name="pathString"></param>
    /// <returns></returns>
    public static bool IsValidLocalPath(this string pathString)
    {
        // From solution at https://stackoverflow.com/a/11636052/949129
        Uri pathUri;
        Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
        return isValidUri && pathUri != null && pathUri.IsLoopback;
    }
}

score 3 · Answer 21 · answered Sep 29 '20 at 14:06

Here is my small contribution. A method to replace within the same string without creating new strings or stringbuilders. It's fast, easy to understand and a good alternative to all mentions in this post.

private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
  get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}

private static string ReplaceInvalidChars(string fileName, string newValue)
{
  char newChar = newValue[0];

  char[] chars = fileName.ToCharArray();
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
      chars[i] = newChar;
  }

  return new string(chars);
}

You can call it like this:

string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string legal = ReplaceInvalidChars(illegal);

and returns:

_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

It's worth to note that this method will always replace invalid chars with a given value, but will not remove them. If you want to remove invalid chars, this alternative will do the trick:

private static string RemoveInvalidChars(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
  bool remove = newChar == char.MinValue;

  char[] chars = fileName.ToCharArray();
  char[] newChars = new char[chars.Length];
  int i2 = 0;
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
    {
      if (!remove)
        newChars[i2++] = newChar;
    }
    else
      newChars[i2++] = c;

  }

  return new string(newChars, 0, i2);
}

BENCHMARK

I executed timed test runs with most methods found in this post, if performance is what you are after. Some of these methods don't replace with a given char, since OP was asking to clean the string. I added tests replacing with a given char, and some others replacing with an empty char if your intended scenario only needs to remove the unwanted chars. Code used for this benchmark is at the end, so you can run your own tests.

Note: Methods Test1 and Test2 are both proposed in this post.

First Run

replacing with '_', 1000000 iterations

Results:

============Test1===============
Elapsed=00:00:01.6665595
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test2===============
Elapsed=00:00:01.7526835
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test3===============
Elapsed=00:00:05.2306227
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test4===============
Elapsed=00:00:14.8203696
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test5===============
Elapsed=00:00:01.8273760
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test6===============
Elapsed=00:00:05.4249985
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test7===============
Elapsed=00:00:07.5653833
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test8===============
Elapsed=00:12:23.1410106
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test9===============
Elapsed=00:00:02.1016708
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test10===============
Elapsed=00:00:05.0987225
Result=M ary had a little lamb.

============Test11===============
Elapsed=00:00:06.8004289
Result=M ary had a little lamb.

Second Run

removing invalid chars, 1000000 iterations

Note: Test1 will not remove, only replace.

Results:

============Test1===============
Elapsed=00:00:01.6945352
Result= M     a ry  h  ad    a          li tt le   la mb.

============Test2===============
Elapsed=00:00:01.4798049
Result=M ary had a little lamb.

============Test3===============
Elapsed=00:00:04.0415688
Result=M ary had a little lamb.

============Test4===============
Elapsed=00:00:14.3397960
Result=M ary had a little lamb.

============Test5===============
Elapsed=00:00:01.6782505
Result=M ary had a little lamb.

============Test6===============
Elapsed=00:00:04.9251707
Result=M ary had a little lamb.

============Test7===============
Elapsed=00:00:07.9562379
Result=M ary had a little lamb.

============Test8===============
Elapsed=00:12:16.2918943
Result=M ary had a little lamb.

============Test9===============
Elapsed=00:00:02.0770277
Result=M ary had a little lamb.

============Test10===============
Elapsed=00:00:05.2721232
Result=M ary had a little lamb.

============Test11===============
Elapsed=00:00:05.2802903
Result=M ary had a little lamb.

BENCHMARK RESULTS

Methods Test1, Test2 and Test5 are the fastest. Method Test8 is the slowest.

CODE

Here's the complete code of the benchmark:

private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
  get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}

private static string _invalidCharsValue;
private static string InvalidCharsValue
{
  get { return _invalidCharsValue ?? (_invalidCharsValue = new string(Path.GetInvalidFileNameChars())); }
}

private static char[] _invalidChars;
private static char[] InvalidChars
{
  get { return _invalidChars ?? (_invalidChars = Path.GetInvalidFileNameChars()); }
}

static void Main(string[] args)
{
  string testPath = "\"M <>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

  int max = 1000000;
  string newValue = "";

  TimeBenchmark(max, Test1, testPath, newValue);
  TimeBenchmark(max, Test2, testPath, newValue);
  TimeBenchmark(max, Test3, testPath, newValue);
  TimeBenchmark(max, Test4, testPath, newValue);
  TimeBenchmark(max, Test5, testPath, newValue);
  TimeBenchmark(max, Test6, testPath, newValue);
  TimeBenchmark(max, Test7, testPath, newValue);
  TimeBenchmark(max, Test8, testPath, newValue);
  TimeBenchmark(max, Test9, testPath, newValue);
  TimeBenchmark(max, Test10, testPath, newValue);
  TimeBenchmark(max, Test11, testPath, newValue);

  Console.Read();
}

private static void TimeBenchmark(int maxLoop, Func<string, string, string> func, string testString, string newValue)
{
  var sw = new Stopwatch();
  sw.Start();
  string result = string.Empty;

  for (int i = 0; i < maxLoop; i++)
    result = func?.Invoke(testString, newValue);

  sw.Stop();

  Console.WriteLine($"============{func.Method.Name}===============");
  Console.WriteLine("Elapsed={0}", sw.Elapsed);
  Console.WriteLine("Result={0}", result);
  Console.WriteLine("");
}

private static string Test1(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];

  char[] chars = fileName.ToCharArray();
  for (int i = 0; i < chars.Length; i++)
  {
    if (InvalidCharsHash.Contains(chars[i]))
      chars[i] = newChar;
  }

  return new string(chars);
}

private static string Test2(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
  bool remove = newChar == char.MinValue;

  char[] chars = fileName.ToCharArray();
  char[] newChars = new char[chars.Length];
  int i2 = 0;
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
    {
      if (!remove)
        newChars[i2++] = newChar;
    }
    else
      newChars[i2++] = c;

  }

  return new string(newChars, 0, i2);
}

private static string Test3(string filename, string newValue)
{
  foreach (char c in InvalidCharsValue)
  {
    filename = filename.Replace(c.ToString(), newValue);
  }

  return filename;
}

private static string Test4(string filename, string newValue)
{
  Regex r = new Regex(string.Format("[{0}]", Regex.Escape(InvalidCharsValue)));
  filename = r.Replace(filename, newValue);
  return filename;
}

private static string Test5(string filename, string newValue)
{
  return string.Join(newValue, filename.Split(InvalidChars));
}

private static string Test6(string fileName, string newValue)
{
  return InvalidChars.Aggregate(fileName, (current, c) => current.Replace(c.ToString(), newValue));
}

private static string Test7(string fileName, string newValue)
{
  string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
  return Regex.Replace(fileName, regex, newValue, RegexOptions.Compiled);
}

private static string Test8(string fileName, string newValue)
{
  string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
  Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
  return removeInvalidChars.Replace(fileName, newValue);
}

private static string Test9(string fileName, string newValue)
{
  StringBuilder sb = new StringBuilder(fileName.Length);
  bool changed = false;

  for (int i = 0; i < fileName.Length; i++)
  {
    char c = fileName[i];
    if (InvalidCharsHash.Contains(c))
    {
      changed = true;
      sb.Append(newValue);
    }
    else
      sb.Append(c);
  }

  if (sb.Length == 0)
    return newValue;

  return changed ? sb.ToString() : fileName;
}

private static string Test10(string fileName, string newValue)
{
  if (!fileName.Any(c => InvalidChars.Contains(c)))
  {
    return fileName;
  }

  return new string(fileName.Where(c => !InvalidChars.Contains(c)).ToArray());
}

private static string Test11(string fileName, string newValue)
{
  string invalidCharsRemoved = new string(fileName
    .Where(x => !InvalidChars.Contains(x))
    .ToArray());

  return invalidCharsRemoved;
}

score 2 · Answer 22 · answered Feb 22 '18 at 11:25

public static class StringExtensions
      {
        public static string RemoveUnnecessary(this string source)
        {
            string result = string.Empty;
            string regex = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
            Regex reg = new Regex(string.Format("[{0}]", Regex.Escape(regex)));
            result = reg.Replace(source, "");
            return result;
        }
    }

You can use method clearly.

score 2 · Answer 23 · answered Dec 02 '18 at 01:49

One liner to cleanup string from any illegal chars for windows file naming:

public static string CleanIllegalName(string p_testName) => new Regex(string.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars())))).Replace(p_testName, "");

Fabske · Answer 24 · 2020-10-23T08:22:01.520

I've rolled my own method, which seems to be a lot faster of other posted here (especially the regex which is so sloooooow) but I didn't tested all methods posted.

https://dotnetfiddle.net/haIXiY

The first method (mine) and second (also mine, but old one) also do an added check on backslashes, so the benchmark are not perfect, but anyways it's just to give you an idea.

Result on my laptop (for 100 000 iterations):

StringHelper.RemoveInvalidCharacters 1: 451 ms  
StringHelper.RemoveInvalidCharacters 2: 7139 ms  
StringHelper.RemoveInvalidCharacters 3: 2447 ms  
StringHelper.RemoveInvalidCharacters 4: 3733 ms  
StringHelper.RemoveInvalidCharacters 5: 11689 ms  (==> Regex!)

The fastest method:

public static string RemoveInvalidCharacters(string content, char replace = '_', bool doNotReplaceBackslashes = false)
{
    if (string.IsNullOrEmpty(content))
        return content;

    var idx = content.IndexOfAny(InvalidCharacters);
    if (idx >= 0)
    {
        var sb = new StringBuilder(content);
        while (idx >= 0)
        {
            if (sb[idx] != '\\' || !doNotReplaceBackslashes)
                sb[idx] = replace;
            idx = content.IndexOfAny(InvalidCharacters, idx+1);
        }
        return sb.ToString();
    }
    return content;
}

Method doesn't compile "as is" dur to InvalidCharacters property, check the fiddle for full code

score 1 · Answer 25 · answered Nov 18 '13 at 13:28

1

public static bool IsValidFilename(string testName)
{
    return !new Regex("[" + Regex.Escape(new String(System.IO.Path.GetInvalidFileNameChars())) + "]").IsMatch(testName);
}

answered Nov 18 '13 at 13:28

mbdavis

3,861
2
22
42

score 0 · Answer 26 · answered Sep 19 '14 at 15:04

This will do want you want, and avoid collisions

 static string SanitiseFilename(string key)
    {
        var invalidChars = Path.GetInvalidFileNameChars();
        var sb = new StringBuilder();
        foreach (var c in key)
        {
            var invalidCharIndex = -1;
            for (var i = 0; i < invalidChars.Length; i++)
            {
                if (c == invalidChars[i])
                {
                    invalidCharIndex = i;
                }
            }
            if (invalidCharIndex > -1)
            {
                sb.Append("_").Append(invalidCharIndex);
                continue;
            }

            if (c == '_')
            {
                sb.Append("__");
                continue;
            }

            sb.Append(c);
        }
        return sb.ToString();

    }

Suplanus · Answer 27 · 2015-07-07T11:45:42.613

I think the question already not full answered... The answers only describe clean filename OR path... not both. Here is my solution:

private static string CleanPath(string path)
{
    string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
    Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
    List<string> split = path.Split('\\').ToList();
    string returnValue = split.Aggregate(string.Empty, (current, s) => current + (r.Replace(s, "") + @"\"));
    returnValue = returnValue.TrimEnd('\\');
    return returnValue;
}

score 0 · Answer 28 · answered Jun 14 '18 at 07:11

I created an extension method that combines several suggestions:

Holding illegal characters in a hash set
Filtering out characters below ascii 127. Since Path.GetInvalidFileNameChars does not include all invalid characters possible with ascii codes from 0 to 255. See here and MSDN
Possiblity to define the replacement character

Source:

public static class FileNameCorrector
{
    private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string ToValidFileName(this string name, char replacement = '\0')
    {
        var builder = new StringBuilder();
        foreach (var cur in name)
        {
            if (cur > 31 && cur < 128 && !invalid.Contains(cur))
            {
                builder.Append(cur);
            }
            else if (replacement != '\0')
            {
                builder.Append(replacement);
            }
        }

        return builder.ToString();
    }
}

Hans-Peter Kalb · Answer 29 · 2020-05-15T15:55:07.620

Here is a function which replaces all illegal characters in a file name by a replacement character:

public static string ReplaceIllegalFileChars(string FileNameWithoutPath, char ReplacementChar)
{
  const string IllegalFileChars = "*?/\\:<>|\"";
  StringBuilder sb = new StringBuilder(FileNameWithoutPath.Length);
  char c;

  for (int i = 0; i < FileNameWithoutPath.Length; i++)
  {
    c = FileNameWithoutPath[i];
    if (IllegalFileChars.IndexOf(c) >= 0)
    {
      c = ReplacementChar;
    }
    sb.Append(c);
  }
  return (sb.ToString());
}

For example the underscore can be used as a replacement character:

NewFileName = ReplaceIllegalFileChars(FileName, '_');

In addition to the answer you've provided, please consider providing a brief explanation of why and how this fixes the issue. — jtate, May 14 '20 at 13:18

score -7 · Answer 30 · answered Jan 15 '14 at 21:24

-7

Or you can just do

[YOUR STRING].Replace('\\', ' ').Replace('/', ' ').Replace('"', ' ').Replace('*', ' ').Replace(':', ' ').Replace('?', ' ').Replace('<', ' ').Replace('>', ' ').Replace('|', ' ').Trim();

answered Jan 15 '14 at 21:24

Danny Fallas

628
1
5
11

How to remove illegal characters from path and filenames?

30 Answers30

Update

Linked

Related