0

Given the following code:

using (var data = new MemoryStream(bytes))
using(var archive = new ZipArchive(data))
{
    foreach (var entry in archive.Entries)
    {
        entry.FullName.Log();
    }
    ...

The exception thrown is:

Illegal characters in path.

On the foreach line.

How do I work out which the affected entry is? Whenever I try to access the entries it throws the exception. It appears that this specific archive is created from a Mac as it contains the _MACOSX folder.

The stack trace:

[ArgumentException: Illegal characters in path.]
   System.IO.Path.CheckInvalidPathChars(String path, Boolean checkAdditional) +14351233
   System.IO.Path.GetFileName(String path) +29
   System.IO.Compression.ZipHelper.EndsWithDirChar(String test) +9
   System.IO.Compression.ZipArchiveEntry.set_FullName(String value) +93
   System.IO.Compression.ZipArchiveEntry..ctor(ZipArchive archive, ZipCentralDirectoryFileHeader cd) +228
   System.IO.Compression.ZipArchive.ReadCentralDirectory() +172
   System.IO.Compression.ZipArchive.get_Entries() +36
Tom Gullen
  • 61,249
  • 84
  • 283
  • 456
  • Try..catch block inside the foreach maybe? – Nekeniehl Mar 29 '21 at 09:22
  • @Nekeniehl doesn't work, it throws on the `foreach` line (when `archive.Entries` is called) – Tom Gullen Mar 29 '21 at 09:22
  • I can't remember exactly, but I think I looked at the code one time and the implementation is validating the zip file filenames against illegal characters as defined by .NET. – ProgrammingLlama Mar 29 '21 at 09:23
  • @Llama any way to detect these invalid entry names before the exception is thrown? The zip has quite a lot of files in it when I open in Winrar, but nothing looks out of place (Winrar might be modifying the archive when I open it to work on Windows) – Tom Gullen Mar 29 '21 at 09:24
  • Does this answer your question? [ZipFile.ExtractToDirectory "Illegal characters in path"](https://stackoverflow.com/questions/23407717/zipfile-extracttodirectory-illegal-characters-in-path) – d4zed Mar 29 '21 at 09:25
  • What's the stack trace on the exception? That would help us narrow down which line of ZipArchive is throwing the exception, and maybe find a route around it – canton7 Mar 29 '21 at 09:25
  • I think your problem is in the stream, but anyway you could check the name by https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getinvalidfilenamechars?view=net-5.0 – Nekeniehl Mar 29 '21 at 09:26
  • @canton7 updated question with stack trace – Tom Gullen Mar 29 '21 at 09:28
  • @d4zed we can't use `ZipFile` as it requires the archive to be on disk, we must load it into memory for performance reasons. – Tom Gullen Mar 29 '21 at 09:28
  • .NET Framework I assume? That code doesn't seem to be present in .NET Core. – canton7 Mar 29 '21 at 09:30
  • @canton7 yes, 4.8 – Tom Gullen Mar 29 '21 at 09:30
  • 2
    Related: https://github.com/dotnet/runtime/issues/15938 – canton7 Mar 29 '21 at 09:32
  • @canton7 thanks for the link - looks to be the issue. So guessing at this stage it's not possible to catch/fix this for affected zips. – Tom Gullen Mar 29 '21 at 09:35
  • I suspect you might be out of luck: that exception is thrown from the `ZipArchiveEntry` ctor, and that's used when parsing the zip directory header. There doesn't seem to be any other way of getting the information in the header. The entries will be partially populated when your exception is thrown, but because `_readEntries` will be false, I can't see a way of accessing it without trying to read the directory header again – canton7 Mar 29 '21 at 09:37
  • You could use reflection / the debugger to read `_entries` / `_entriesCollection` after calling `.Entries` and catching the resulting exception. That will tell you which entries were successfully read, so the one that failed is the one after the last entry. But you could also use the debugger to step into `.Entries` and see which one it's failing on. – canton7 Mar 29 '21 at 09:39
  • Note that the issue I linked was fixed, 5 years ago. The problem is that the fix is only in .NET Core, not in the grandfathered .NET Framework. – canton7 Mar 29 '21 at 09:46
  • @canton7 Why Microsoft? ...why? If that's the case though, I wonder if OP could simply move their code into a .NET Standard class library? I suppose that would just use the .NET Framework implementation, wouldn't it? – ProgrammingLlama Mar 29 '21 at 09:50
  • @canton7 thanks for your help, if you wish to summarise in an answer will mark it as the answer – Tom Gullen Mar 29 '21 at 09:50
  • @Llama unfortunately that is a big undertaking for us and not something we can do right now - am solo dev and project is ~105k lines of executable code (webforms :() – Tom Gullen Mar 29 '21 at 09:51
  • @Llama Development has been focussed in .NET Core for a long time now. If you want new features (and there are lots of fantastic new features) and minor bug-fixes, you need to be using .NET Core. .NET Framework is being kept for backwards compatibility only. Targetting .NET Standard wouldn't really help: what matters is the runtime that's being used to execute the code – canton7 Mar 29 '21 at 09:52

2 Answers2

3

Your exception is being thrown because ZipArchive.Entries attempts to read the zip central directory in order to construct a ZipArchiveEntry for each entry. The ZipArchiveEntry constructor is calling Path.GetFileName, which calls Path.CheckInvalidPathChars, which throws if the path contains \0.

Although an exception part-way through reading the central directory will leave the ZipArchive with a partially-populated list of entries, there doesn't seem to be any way to read it without triggering a read of the central directory again, unless you use the debugger / reflection to read _entries or _entriesCollection. This will tell you which entries succeeded, which might narrow down the failing one a bit. If you're going to these lengths, debugging into the .Entries call would tell you exactly which entry is failing.

It looks like a related issue with paths being treated differently across different platforms was recognised in 2015, and as part of that a change was made to get the file name without calling Path.CheckInvalidPathChars, which should avoid the issue (you'll still get an exception if you try to write this entry to a file on Windows, but at least .Entries shouldn't throw). Unfortuatenly this was only introduced in .NET Core however.

canton7
  • 37,633
  • 3
  • 64
  • 77
0

Agree with comments. Are you sure your file is intact and originates from a compatible platform.. What happens if you open your Archive with e.g. WinRar ?

I tested below code before and successfully modified it using a memory stream like you do, and I am sure your version should work as well..

    [Test]
    public void TestZipper()
    {
        using (var dataFromFile = new FileStream("d:\\lx\\MyZip.ZIP", FileMode.Open))
        {
            var dataInMemory = new MemoryStream();
            dataFromFile.CopyTo(dataInMemory);
            using (var archive = new ZipArchive(dataInMemory))
            {
                foreach (var entry in archive.Entries)
                {
                    Debug.WriteLine(entry.FullName);
                }
                Assert.That(MyZipListIsComplete(archive.Entries));
            }
        }
    }
Goodies
  • 1,951
  • 21
  • 26
  • The .NET Framework library is validating the filename for the operating system it's on _when reading the entries_. Different operating systems have different filename and directory name requirements, and as such can produce zip files with internal filenames that are invalid on Windows. This is a bug in .NET, since it _should_ still be possible to read the zip file and the contained filenames without error, since the coder could simply extract them with Windows-friendly filenames. – ProgrammingLlama Mar 29 '21 at 09:39
  • What is a "compatible platform"? It's a zip file... OP isn't trying to extract the files. – ProgrammingLlama Mar 29 '21 at 09:40
  • Filenames are sometimes a pain in the ass. What if you are not interested in filenames, only in content ? maybe the zipper library has some enumerator that allows you to get content without accessing the filename ? other thing I can think of is encoding.. – Goodies Mar 29 '21 at 09:41
  • The zip is generated from MacOS, and it appears to open just fine in Winrar on my Windows machine but as I understand it Winrar might be ignoring/not showing the affected folders/files. – Tom Gullen Mar 29 '21 at 09:47