3

I am using System.IO.Compression to extract the content of some Zip files. The problem is that whenever there is an entry with a filename that contains some Windows' illegal characters then an exception is thrown. I have tried several things but I still didn't find any way to disregard the bad entries and extract those that are good. Please, consider that modifying the content of the zip file is not a possibility for the type of processing we are performing, so I must process the file as it is.

The system usually processes files with several entries, this number is variable, but it could be up to 300 entries in one zip file, and occasionally there will be an entry with a filename such as 'myfile<name>.txt', which contains angle brackets that are clearly illegal characters for Windows. I really want to disregard this entry and move on to extract the rest of the entries within the ZipArchive. But it looks that this is not possible.

Any idea on how to disregard the bad entries of a ZipArchive?

So far I have tried different things to get the entries separately, but I am always getting the exact same exception error.

Here are some of the things I have tried so far:

  • Implementing the regular way to iterate over the entries:

    foreach (ZipArchiveEntry entry in ZipArchive.Entries)
    
  • Trying to get only one entry by index (same exception here even though the first entry is a valid one):

    ZipArchiveEntry entry = ZipArchive.Entries[0]
    
  • Applying a filter using a lambda expression to disregard the invalid entries (same exception also):

    var entries = zipArchive.Entries.Where(a => 
    a.FullName.IndexOfAny(Path.GetInvalidFileNameChars() ) == -1);
    

Nothing of this helps and the exception I get every single time is as follows:

at System.IO.Path.CheckInvalidPathChars(String path, Boolean checkAdditional) at System.IO.Path.GetFileName(String path) at System.IO.Compression.ZipHelper.EndsWithDirChar(String test) at System.IO.Compression.ZipArchiveEntry.set_FullName(String value) at System.IO.Compression.ZipArchiveEntry..ctor(ZipArchive archive, ZipCentralDirectoryFileHeader cd) at System.IO.Compression.ZipArchive.ReadCentralDirectory() at System.IO.Compression.ZipArchive.get_Entries() at ZipLibraryConsole.MicrosoftExtraction.RecursiveExtract(Stream fileToExtract, Int32 maxDepthLevel, Attachment att) in C:\Users\myUser\Documents\Visual Studio 2015\Projects\ZipLibraryConsole\ZipLibraryConsole\MicrosoftExtraction.cs:line 47

This is a snippet of the implemented code:

var zipArchive = new ZipArchive(fileToExtract, ZipArchiveMode.Read);
  try
    {
      foreach (var zipEntry in zipArchive.Entries) // the exception is thrown  here, there is no chance to process valid entries at all
      {
        // Do something and extract the file
      }
    catch (ArgumentException exception)
    {
      Console.WriteLine(
        String.Format("Failed to complete the extraction. At least one path contains invalid characters for the Operating System: {0}{1}",                       att.Name, att.Extention));
     }
Ken White
  • 123,280
  • 14
  • 225
  • 444
  • Have you tried with a different library, e.g. DotNetZip or SharpZipLib? – Thomas Levesque Feb 15 '17 at 23:42
  • This is a known bug. You'll have to use some library other than .NET, one which can accommodate .zip archives that have entries with names that don't comply with the Windows rules. https://connect.microsoft.com/VisualStudio/feedback/details/808187/ziparchive-does-not-handle-archives-containing-items-with-names-that-have-windows-prohibited-characters-in-them – Peter Duniho Feb 16 '17 at 00:15
  • We are currently using SharpZipLib, and it works very well, but we have had some problems recently with some zip files, the effect we are observing is similar to ZipBomb, but in this case, the zip files are corrupted, other libraries are able to identify these quickly, but SharpZipLib cannot, and it goes through an infinite loop adding and adding bytes to the extracted file in the hard drive until it runs out of space. So we are evaluating other libraries, and that is how we come up with Microsoft's zip library. – Juan Luis Hidalgo Feb 16 '17 at 00:45
  • Hey Peter, your link is very useful, thanks for sharing it, and it is a shame that this bug is not going to be fixed in .NET any soon. Apart of this problem, the library worked very well to us during the proof of concept as the replacement of SharpZipLib. Oh man, I'll have to look for another option. – Juan Luis Hidalgo Feb 16 '17 at 00:55
  • 1
    The issue on Connect is closed as "won't fix" (as is often the case on Connect), but it seems to have been fixed in .NET Core: https://github.com/dotnet/corefx/issues/4991 – Thomas Levesque Feb 16 '17 at 01:12
  • Thanks Thomas, your answer is much appreciated. – Juan Luis Hidalgo Feb 17 '17 at 23:36

1 Answers1

0

Using System.Reflection you can at least hide the errors, although you only get entries up to the one with the path containing illegal characters.

Add this class and use archive.GetRawEntries() instead of archive.Entries

public static class ZipArchiveHelper
{
    private static FieldInfo _Entries;
    private static MethodInfo _EnsureDirRead;
    static ZipArchiveHelper()
    {
        _Entries = typeof(ZipArchive).GetField("_entries", BindingFlags.NonPublic | BindingFlags.Instance);
        _EnsureDirRead = typeof(ZipArchive).GetMethod("EnsureCentralDirectoryRead", BindingFlags.NonPublic | BindingFlags.Instance);
    }
    public static List<ZipArchiveEntry> GetRawEntries(this ZipArchive archive)
    {
        try { _EnsureDirRead.Invoke(archive, null); } catch { }
        return (List<ZipArchiveEntry>)_Entries.GetValue(archive);
    }
}

The try-catch is ugly and you could catch a specific exceptions if it bugs you. According to the comments above, this is fixed in .NET Core. (UPDATE: Confirmed this is fixed in .Net Core 3.1, maybe earlier).

Credit for this (partial) fix to https://www.codeproject.com/Tips/1007398/Avoid-Illegal-Characters-in-Path-error-in-ZipArchi and https://gist.github.com/rdavisau/b66df9c99a4b11c5ceff

More pointers on fixing paths with illegal characters (not just zip files) at ZipFile.ExtractToDirectory "Illegal characters in path"

Jon R
  • 836
  • 11
  • 9