4

Here's the scenario.

I'm calling a database and pulling in around ~3000 records which are child nodes of parent documents, with these child nodes I'm working back up the tree to generate a flat folder name which looks something like this:

Entertainment Categories ~ Artistic Entertainment ~ Calligraphy Artists ~ Megumi - Japanese Calligraphy Artist Sussex ~ 6546

Going from parent all the way down to child and then child ID.

What I then do is iterate over these folder names and create a new folder using the name, this all works fine, I'm running the names through a loop using Path.GetInvalidFileNameChars to get rid of any characters that would prevent the folder creation from failing.

However, when it comes to zipping these folders up using the built in zip function in windows (right click > send to > compressed zip folder) I keep getting errors:

[Folder name] cannot be compressed because it includes characters that cannot be used in a compressed folder, such as [foo]. You should rename this file or directory.

It would be fine if the error message actually told me the range of characters that cannot be included in folder names but it doesn't, so whenever I do a replace on the character I get a new one pop up in the error message, this is what I'm doing to clean the folder name at the moment:

private static void CleanPath(StringBuilder path)
{
    List<string> invalidFolderCharacters = Path.GetInvalidFileNameChars()
                                           .Select(x => x.ToString()).ToList();
    invalidFolderCharacters.Add("–");
    invalidFolderCharacters.Add("`");
    invalidFolderCharacters.Add("\'");
    invalidFolderCharacters.Add("′");

    foreach (string s in invalidFolderCharacters)
    {
         path.Replace(s, string.Empty);
    }
}

As you can see, I'm having to add to the characters returned by GetInvalidFileNameChars() each time a new error pops up.

So my question is - Is there a built in function in the .NET framework that I can use which will provide me with characters that aren't allowed in file/folder names as well as characters that cannot be in compressed folder names? Can anyone tell me what characters aren't allowed in compressed folder names so that I can create one myself?

Harry Johnston
  • 35,639
  • 6
  • 68
  • 158
DGibbs
  • 14,316
  • 7
  • 44
  • 83
  • 1
    found [this](http://superuser.com/questions/476740/error-while-zipping-files-with-unicode-characters-in-names-with-win7s-send-to) article, it says `The ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437` – Guru Stron Dec 12 '13 at 09:31
  • I recommend you use a whitelist instead. Zap everything except alphanumerics, spaces, and a few other characters that you've tested and that seem reasonable. Remember that it (presumably) has to work on every other implementation of zip out there, not just on the one you're using now. – Harry Johnston Dec 12 '13 at 22:55

3 Answers3

4

There is also a method called Path.GetInvalidPathChars

Steve
  • 213,761
  • 22
  • 232
  • 286
  • 1
    I've already tried using this, unfortunately it gave the same result – DGibbs Dec 12 '13 at 09:22
  • Strange, the set should be complete. Unless there is something related to a different Operating System. – Steve Dec 12 '13 at 09:24
  • Or it something related to the build-in functionality, because I am pretty sure that the `–` char is not an invalid path char – Steve Dec 12 '13 at 09:26
  • 1
    The en dash isn't an invalid path/file name char but it's not allowed in compressed folder names – DGibbs Dec 12 '13 at 09:28
  • Confirmed, just tried here. en-dash is accepted when creating a new folder, but that folder cannot be compressed. – Steve Dec 12 '13 at 09:32
  • The command line tool COMPACT ignore these chars, but I don't know if it is a viable solution because it doesn't build a zip file but instead mark the folder as compressed so every file added there will be automatically compressed, otherwise I think you should extend your replace method to all chars that are outside the standard ANSI – Steve Dec 12 '13 at 09:46
  • 1
    I ended up trying the compress again, it complained about `’`. Added this to the list of characters I check against and it zipped up fine. Kinda annoying that the error message doesn't just tell you what characters aren't allowed but at least now I know. Thanks for your help with this – DGibbs Dec 12 '13 at 09:52
  • 1
    Unfortunately, you've answered the question in the (original) title rather than the question in the question. :-) – Harry Johnston Dec 12 '13 at 22:54
  • @DGibbs : I found this thread, while looking for my answer. Is there any way I can preserv em dash in file name while create zip. https://stackoverflow.com/questions/50250250 – Amit Kumar May 09 '18 at 09:57
0

One reason you might get this message is because of Unicode characters. However, trying to figure out if a string is Unicode from your program will be a challenge. See When is a string not a string?

I would try to PInvoke the native IsTextUnicode function.

Here a couple other suggestions:
https://stackoverflow.com/a/4459679/2596334
https://stackoverflow.com/a/1522897/2596334

Community
  • 1
  • 1
Scott Solmer
  • 3,871
  • 6
  • 44
  • 72
-1

This is pulled from a JavaScript project I wrote, which was originally compiled from this Wikipedia article:

var contain = ['"', '*', ':', '<', '>', '?', '\\', '/', '|', '+', '[', ']'];
var fullname = ['AUX', 'COM1', 'COM2', 'COM3', 'COM4', 'COM5', 'COM6', 'COM7', 'COM8', 'COM9', 'CON', 'LPT1', 'LPT2', 'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9', 'NUL', 'PRN'];

Filenames cannot contain any of the characters in the first var, and cannot be named exactly the same as any of the names in the second var.

Danny Beckett
  • 20,529
  • 24
  • 107
  • 134
  • your list doesn't contain some o characters added by OP. – Guru Stron Dec 12 '13 at 09:27
  • The `CleanPath()` method already checks against those characters in addition to others – DGibbs Dec 12 '13 at 10:09
  • I believe you've answered the question in the (original) title rather than the question in the question. :-) – Harry Johnston Dec 12 '13 at 22:54
  • I just created a folder named `+[]` and added it to a zip file using Windows drag and drop. Windows thinks they are valid zip folder name characters. – jnm2 May 29 '15 at 13:48
  • @jnm2 The characters I chose to include were to maximise compatibility between different filesystems and OS's. According to Wikipedia, `+[]` are invalid under FAT32. – Danny Beckett May 30 '15 at 12:38