111

I've got a string like "Foo: Bar" that I want to use as a filename, but on Windows the ":" char isn't allowed in a filename.

Is there a method that will turn "Foo: Bar" into something like "Foo- Bar"?

Ken
  • 2,651
  • 3
  • 19
  • 17

16 Answers16

177

Try something like this:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}

Edit:

Since GetInvalidFileNameChars() will return 10 or 15 chars, it's better to use a StringBuilder instead of a simple string; the original version will take longer and consume more memory.

Gabber
  • 5,152
  • 6
  • 35
  • 49
Diego Jancic
  • 7,280
  • 7
  • 52
  • 80
  • Good call on S.I.P.GIFNC. The loop is roughly what I ended up doing, but I'm not crazy about calling string.Replace in a loop -- I was hoping there would be a builtin that was both simple *and* efficient. – Ken Mar 09 '09 at 17:07
  • 1
    You could use a StringBuilder if you wish, but if the names are short and i guess it's not worth it. You could also create your own method to create a char[] and replace all wrong chars in one iteration. Always is better to keep it simple unless it doesn't work, you might have worse bottle necks – Diego Jancic Mar 10 '09 at 14:55
  • I don't know c#, but is it not possible to use a remove() method that takes a set of characters? This set of characters appears to be handily provided by GetInvalidFileNameChars(). Also, realistically, how many times will that loop iterate? 6 usually, 40 at most if the fnuction also returns non printed ascii, maybe? caveat: the msdn for that function also mentions that you should use GetInvalidPathChars, as GIFNC doesn't return a '\' or '/', which are invalid filename chars. – Pod Sep 09 '09 at 11:04
  • I don't know any "Remove" method, similar to the one you are talking about; even if it exist how it would be able to resolve faster? The only thing it could do is to copy the result of GIFNC to an array to avoid the call overhead (if any). Regarding the other comment, you should use GIFNC because this one is which returns the \ and /. Use Reflector to check the Path's static constructor if you wish. Here's the declarition in Windows (in Mono Linux might be different): – Diego Jancic Sep 09 '09 at 13:17
  • 2
    InvalidFileNameChars = new char[] { '"', '<', '>', '|', '\0', '\x0001', '\x0002', '\x0003', '\x0004', '\x0005', '\x0006', '\a', '\b', '\t', '\n', '\v', '\f', '\r', '\x000e', '\x000f', '\x0010', '\x0011', '\x0012', '\x0013', '\x0014', '\x0015', '\x0016', '\x0017', '\x0018', '\x0019', '\x001a', '\x001b', '\x001c', '\x001d', '\x001e', '\x001f', ':', '*', '?', '\\', '/' }; – Diego Jancic Sep 09 '09 at 13:19
  • 10
    The probability to have 2+ different invalid chars in the string is so small that caring about performance of string.Replace() is pointless. – Serge Wautier Mar 14 '11 at 08:20
  • There's an additional cost to create a StringBuilder object which is more expensive than declaring a value-type string. I doubt its worth using a StringBuilder in this particular scenario as the string size and loop count are so tiny. – NickG Oct 09 '14 at 14:48
  • @NickG that's a good point. Others have mentioned this as well. It all depends on the average invalid characters you expect to have. If it will be near zero, then use a string. If you expect to always have one or more, then I would go with a StringBuilder. – Diego Jancic Oct 09 '14 at 19:06
  • 2
    Great solution, interesting aside, resharper suggested this Linq version: fileName = System.IO.Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c, '_')); I wonder if there are any possible performance improvements there. I have kept the original for readability purposes as performance is not my biggest concern. But if anyone is interested, might be worth benchmarking – chrispepper1989 Mar 24 '15 at 10:53
  • This doesn't change . (period/fullstop) characters. You may want to ensure there is only one of these for the final file extension. – AndyM May 15 '16 at 03:45
  • 1
    @AndyM No need to. `file.name.txt.pdf` is a valid pdf. Windows reads only the last `.` for the extension. – Diego Jancic May 25 '16 at 13:32
37
fileName = fileName.Replace(":", "-") 

However ":" is not the only illegal character for Windows. You will also have to handle:

/, \, :, *, ?, ", <, > and |

These are contained in System.IO.Path.GetInvalidFileNameChars();

Also (on Windows), "." cannot be the only character in the filename (both ".", "..", "...", and so on are invalid). Be careful when naming files with ".", for example:

echo "test" > .test.

Will generate a file named ".test"

Lastly, if you really want to do things correctly, there are some special file names you need to look out for. On Windows you can't create files named:

CON, PRN, AUX, CLOCK$, NUL
COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9
LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
Phil Price
  • 2,283
  • 20
  • 22
17

This isn't more efficient, but it's more fun :)

var fileName = "foo:bar";
var invalidChars = System.IO.Path.GetInvalidFileNameChars();
var cleanFileName = new string(fileName.Where(m => !invalidChars.Contains(m)).ToArray<char>());
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Joseph Gabriel
  • 8,339
  • 3
  • 39
  • 53
15

In case anyone wants an optimized version based on StringBuilder, use this. Includes rkagerer's trick as an option.

static char[] _invalids;

/// <summary>Replaces characters in <c>text</c> that are not allowed in 
/// file names with the specified replacement character.</summary>
/// <param name="text">Text to make into a valid filename. The same string is returned if it is valid already.</param>
/// <param name="replacement">Replacement character, or null to simply remove bad characters.</param>
/// <param name="fancy">Whether to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
/// <returns>A string that can be used as a filename. If the output string would otherwise be empty, returns "_".</returns>
public static string MakeValidFileName(string text, char? replacement = '_', bool fancy = true)
{
    StringBuilder sb = new StringBuilder(text.Length);
    var invalids = _invalids ?? (_invalids = Path.GetInvalidFileNameChars());
    bool changed = false;
    for (int i = 0; i < text.Length; i++) {
        char c = text[i];
        if (invalids.Contains(c)) {
            changed = true;
            var repl = replacement ?? '\0';
            if (fancy) {
                if (c == '"')       repl = '”'; // U+201D right double quotation mark
                else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                else if (c == '/')  repl = '⁄'; // U+2044 fraction slash
            }
            if (repl != '\0')
                sb.Append(repl);
        } else
            sb.Append(c);
    }
    if (sb.Length == 0)
        return "_";
    return changed ? sb.ToString() : text;
}
spottedmahn
  • 14,823
  • 13
  • 108
  • 178
Qwertie
  • 16,354
  • 20
  • 105
  • 148
  • +1 for nice and readable code. Makes very easy to read & notice the bugs :P.. This function should return always original string as changed will never be true. – Erti-Chris Eelmaa Aug 24 '14 at 18:33
  • Thanks, I think it's better now. You know what they say about open source, "many eyes make all bugs shallow so I don't have to write unit tests"... – Qwertie Aug 25 '14 at 16:08
11

Here's a version of the accepted answer using Linq which uses Enumerable.Aggregate:

string fileName = "something";

Path.GetInvalidFileNameChars()
    .Aggregate(fileName, (current, c) => current.Replace(c, '_'));
DavidG
  • 113,891
  • 12
  • 217
  • 223
10

A simple one line code:

var validFileName = Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));

You can wrap it in an extension method if you want to reuse it.

public static string ToValidFileName(this string fileName) => Path.GetInvalidFileNameChars().Aggregate(fileName, (f, c) => f.Replace(c, '_'));
Moch Yusup
  • 1,266
  • 14
  • 14
9

Here's a slight twist on Diego's answer.

If you're not afraid of Unicode, you can retain a bit more fidelity by replacing the invalid characters with valid Unicode symbols that resemble them. Here's the code I used in a recent project involving lumber cutlists:

static string MakeValidFilename(string text) {
  text = text.Replace('\'', '’'); // U+2019 right single quotation mark
  text = text.Replace('"',  '”'); // U+201D right double quotation mark
  text = text.Replace('/', '⁄');  // U+2044 fraction slash
  foreach (char c in System.IO.Path.GetInvalidFileNameChars()) {
    text = text.Replace(c, '_');
  }
  return text;
}

This produces filenames like 1⁄2” spruce.txt instead of 1_2_ spruce.txt

Yes, it really works:

Explorer sample

Caveat Emptor

I knew this trick would work on NTFS but was surprised to find it also works on FAT and FAT32 partitions. That's because long filenames are stored in Unicode, even as far back as Windows 95/NT. I tested on Win7, XP, and even a Linux-based router and they showed up OK. Can't say the same for inside a DOSBox.

That said, before you go nuts with this, consider whether you really need the extra fidelity. The Unicode look-alikes could confuse people or old programs, e.g. older OS's relying on codepages.

Community
  • 1
  • 1
rkagerer
  • 4,157
  • 1
  • 26
  • 29
8

Diego does have the correct solution but there is one very small mistake in there. The version of string.Replace being used should be string.Replace(char, char), there isn't a string.Replace(char, string)

I can't edit the answer or I would have just made the minor change.

So it should be:

string fileName = "something";
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
   fileName = fileName.Replace(c, '_');
}
leggetter
  • 15,248
  • 1
  • 55
  • 61
6

Another simple solution:

private string MakeValidFileName(string original, char replacementChar = '_')
{
  var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars());
  return new string(original.Select(c => invalidChars.Contains(c) ? replacementChar : c).ToArray());
}
GDemartini
  • 341
  • 1
  • 3
  • 6
5

Here's a version that uses StringBuilder and IndexOfAny with bulk append for full efficiency. It also returns the original string rather than create a duplicate string.

Last but not least, it has a switch statement that returns look-alike characters which you can customize any way you wish. Check out Unicode.org's confusables lookup to see what options you might have, depending on the font.

public static string GetSafeFilename(string arbitraryString)
{
    var invalidChars = System.IO.Path.GetInvalidFileNameChars();
    var replaceIndex = arbitraryString.IndexOfAny(invalidChars, 0);
    if (replaceIndex == -1) return arbitraryString;

    var r = new StringBuilder();
    var i = 0;

    do
    {
        r.Append(arbitraryString, i, replaceIndex - i);

        switch (arbitraryString[replaceIndex])
        {
            case '"':
                r.Append("''");
                break;
            case '<':
                r.Append('\u02c2'); // '˂' (modifier letter left arrowhead)
                break;
            case '>':
                r.Append('\u02c3'); // '˃' (modifier letter right arrowhead)
                break;
            case '|':
                r.Append('\u2223'); // '∣' (divides)
                break;
            case ':':
                r.Append('-');
                break;
            case '*':
                r.Append('\u2217'); // '∗' (asterisk operator)
                break;
            case '\\':
            case '/':
                r.Append('\u2044'); // '⁄' (fraction slash)
                break;
            case '\0':
            case '\f':
            case '?':
                break;
            case '\t':
            case '\n':
            case '\r':
            case '\v':
                r.Append(' ');
                break;
            default:
                r.Append('_');
                break;
        }

        i = replaceIndex + 1;
        replaceIndex = arbitraryString.IndexOfAny(invalidChars, i);
    } while (replaceIndex != -1);

    r.Append(arbitraryString, i, arbitraryString.Length - i);

    return r.ToString();
}

It doesn't check for ., .., or reserved names like CON because it isn't clear what the replacement should be.

jnm2
  • 7,960
  • 5
  • 61
  • 99
3

Cleaning a little my code and making a little refactoring... I created an extension for string type:

public static string ToValidFileName(this string s, char replaceChar = '_', char[] includeChars = null)
{
  var invalid = Path.GetInvalidFileNameChars();
  if (includeChars != null) invalid = invalid.Union(includeChars).ToArray();
  return string.Join(string.Empty, s.ToCharArray().Select(o => o.In(invalid) ? replaceChar : o));
}

Now it's easier to use with:

var name = "Any string you want using ? / \ or even +.zip";
var validFileName = name.ToValidFileName();

If you want to replace with a different char than "_" you can use:

var validFileName = name.ToValidFileName(replaceChar:'#');

And you can add chars to replace.. for example you dont want spaces or commas:

var validFileName = name.ToValidFileName(includeChars: new [] { ' ', ',' });

Hope it helps...

Cheers

Joan Vilariño
  • 113
  • 1
  • 7
1

I needed a system that couldn't create collisions so I couldn't map multiple characters to one. I ended up with:

public static class Extension
{
    /// <summary>
    /// Characters allowed in a file name. Note that curly braces don't show up here
    /// becausee they are used for escaping invalid characters.
    /// </summary>
    private static readonly HashSet<char> CleanFileNameChars = new HashSet<char>
    {
        ' ', '!', '#', '$', '%', '&', '\'', '(', ')', '+', ',', '-', '.',
        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '=', '@',
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
        '[', ']', '^', '_', '`',
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
    };

    /// <summary>
    /// Creates a clean file name from one that may contain invalid characters in 
    /// a way that will not collide.
    /// </summary>
    /// <param name="dirtyFileName">
    /// The file name that may contain invalid filename characters.
    /// </param>
    /// <returns>
    /// A file name that does not contain invalid filename characters.
    /// </returns>
    /// <remarks>
    /// <para>
    /// Escapes invalid characters by converting their ASCII values to hexadecimal
    /// and wrapping that value in curly braces. Curly braces are escaped by doubling
    /// them, for example '{' => "{{".
    /// </para>
    /// <para>
    /// Note that although NTFS allows unicode characters in file names, this
    /// method does not.
    /// </para>
    /// </remarks>
    public static string CleanFileName(this string dirtyFileName)
    {
        string EscapeHexString(char c) =>
            "{" + (c > 255 ? $"{(uint)c:X4}" : $"{(uint)c:X2}") + "}";

        return string.Join(string.Empty,
                           dirtyFileName.Select(
                               c =>
                                   c == '{' ? "{{" :
                                   c == '}' ? "}}" :
                                   CleanFileNameChars.Contains(c) ? $"{c}" :
                                   EscapeHexString(c)));
    }
}
mheyman
  • 4,211
  • 37
  • 34
0

I needed to do this today... in my case, I needed to concatenate a customer name with the date and time for a final .kmz file. My final solution was this:

 string name = "Whatever name with valid/invalid chars";
 char[] invalid = System.IO.Path.GetInvalidFileNameChars();
 string validFileName = string.Join(string.Empty,
                            string.Format("{0}.{1:G}.kmz", name, DateTime.Now)
                            .ToCharArray().Select(o => o.In(invalid) ? '_' : o));

You can even make it replace spaces if you add the space char to the invalid array.

Maybe it's not the fastest, but as performance wasn't an issue, I found it elegant and understandable.

Cheers!

Joan Vilariño
  • 113
  • 1
  • 7
0

There are no valid answers in this topic yet. Author said: "...I want to use as a filename...". Remove/replace invalid characters is not enough to use something as filename. You should at least check that:

  1. You don't already have file with such name in a folder, where you want to create a new one
  2. Total path to file (path to folder + filename + extension) is not more than MAX_PATH (260 symbols). Yes, there are tricks to hack this on latest Windows, but if you want your app to work fine - you should check it
  3. You don't use any special filenames (see answer by @Phil Price)

Probably the best way would be to:

  1. Remove bad characters using one of the other answers here.
  2. Make sure total path is less than 260 characters (if not - remove last N chars)
  3. Make sure file with given filename doesn't exist (if it does - replace last N chars until find available filename)
  4. Make sure you don't use any reserved filenames (if you do - replace last N chars until find proper and available filename)

As always, things are more complicated, then they look. Better to use some already existing function, like GetTempFileNameW

Ezh
  • 579
  • 1
  • 7
  • 26
0

Still another solution I am using for the last ~10 years, very similar to previous solutions, without the 'fancy' parts: The main method gets the specialcharacters as input, since I was using it also for other purposes, e.g. getting web compatible names, especially back then when renaming files for SharePoint/OneDrive

Not sure how much it improves the speed, but also chose to check the filename for any special characters BEFORE using the StringBuilder with IndexOfAny().

private static string SanitizeFilename(this string filename) 
   => filename.RemoveOrReplaceSpecialCharacters(Path.GetInvalidFileNameChars(), '_');

private static string RemoveOrReplaceSpecialCharacters(this string str, char[] specialCharacters, char? replaceChar)
{
    if (string.IsNullOrEmpty(str))
        return str;
    if (specialCharacters == null || specialCharacters.Length == 0)
        return str;

    if (str.IndexOfAny(specialCharacters) == 0)
        return str;

    var sb = new StringBuilder(str.Length);
    foreach (char c in str)
    {
        if (!specialCharacters.Contains(c))
            sb.Append(c);
        else if (replaceChar.HasValue)
            sb.Append(replaceChar.Value);
    }
    return sb.ToString();         
}

I tried also

return new string(str.Except(specialCharacters).ToArray());

but it created strange behavior, where duplicate are ignored and further issue. For instance, "Bla-ID" became "BlaI" when specifying - as single special char.

EricBDev
  • 1,279
  • 13
  • 21
-2

You can do this with a sed command:

 sed -e "
 s/[?()\[\]=+<>:;©®”,*|]/_/g
 s/"$'\t'"/ /g
 s/–/-/g
 s/\"/_/g
 s/[[:cntrl:]]/_/g"
D W
  • 2,979
  • 4
  • 34
  • 45
  • also see a more complicated but related question at: http://stackoverflow.com/questions/4413427/automated-renaming-of-linux-filenames-to-a-new-filenames-that-are-legal-in-window – D W Dec 11 '10 at 01:02
  • Why does this need to be done in C# rather than Bash? I see now a tag of C# on the original question, but why? – D W Oct 18 '16 at 01:59
  • 2
    I know, right, why not just shell out from the C# application to Bash that might not be installed to accomplish this? – Peter Ritchie Oct 18 '16 at 21:45