2

I want to write a function that will explore a ZIP file and will find if it contains a .png file. Problem is, it should also explore contained zip files that might be within the parent zip (also from other zip files and folders).

as if it is not painful enough, the task must be done without extracting any of the zip files, parent or children.

I would like to write something like this (semi pseudo):

public bool findPng(zipPath) {
    bool flag = false;
    using (ZipArchive archive = ZipFile.OpenRead(zipPath))    
    {
       foreach (ZipArchiveEntry entry in archive.Entries)
       {
         string s = entry.FullName;
         if (s.EndsWith(".zip")) 
         {
             /* recoursively calling findPng */
             flag = findPng(s);
             if (flag == true)
             return true;
         }
         /* same as above with folders within the zip */

         if((s.EndsWith(".png")
           return true;
       }
       return false
   }
}

Problem is, I can't find a way to explore inner zip files without extracting the file, which is a must prerequisite (to not extract the file).

Thanks in advance!

guyr79
  • 169
  • 1
  • 11
  • There's no way you can explore inner zip files without any extraction. Opening a stream from the individual `.zip` entry and creating a new `ZipArchive` object with it ought to allow you do it with the minimal extraction necessary. – ProgrammingLlama May 06 '20 at 13:35
  • I've reinterpreted your question to mean that you don't want to extract the inner zip to a file, rather than objecting to its decompression (which is what I thought you meant by extraction) and believe that [this answer](https://stackoverflow.com/a/14753848/3181933) of the linked question, along with its first comment is what you need. – ProgrammingLlama May 06 '20 at 13:39
  • If the linked duplicate doesn't answer your question, please edit your question to explain how and then tag me in a comment (@john) and I'll re-open your question. – ProgrammingLlama May 06 '20 at 13:41
  • Thank you John. The link does not answer my question, as I specifically pointed a no extraction necessity. The link you gave me does not help in this regard, so my question is not duplicate but a different questions with different needs (I saw this question before). As for your answer: are you 100% positive it can not be done? maybe another user of stackoverflow might think it can be done? – guyr79 May 06 '20 at 13:43
  • When you say "extraction", do you mean to a file, or you mean you don't want to extract any of the inner zip file even in memory? – ProgrammingLlama May 06 '20 at 13:43
  • The only thing I need is to read the types of the files within the zip file. Problem is. zip file might contain other zip files. Do I have to extract the zip file in order to read its entries names? Don't know about extraction to memory – guyr79 May 06 '20 at 13:48
  • Well, the only way you can do it is by reading it, and that requires partial extraction in memory or else you're not reading the file, right? I've changed the question I think yours duplicates, since it seems like what you need instead. See [this answer](https://stackoverflow.com/a/21767078/3181933). All you need to do is open a stream from the entry. – ProgrammingLlama May 06 '20 at 13:53
  • The parent zip file does not hold a ToC for nested zip files. Therefore you would need to at least read the inner zip file, meaning that you need to partially extract it, if only into memory by exposing it as a stream. As far as the parent zip file is concerned, the inner zip file is just another file. I told you in my first comment how this could be done if you are happy with this, and the new linked duplicate + answer I linked shows you how to do that. I'd say that already gives you the answer you're looking for, no? – ProgrammingLlama May 06 '20 at 13:58
  • 2
    It depends on the zipfile. Under 'normal' compression, contained zipfiles will be stored 'as is' (`COMP_STORED`) and the file list inside those files can still be read without decompressing or extracting the file (each zipfile contains a `dirEntry` list). This will require an understanding of the zipfile structure and some user code (I don't think there's a library for that). – Danny_ds May 06 '20 at 14:01
  • @Danny_ds Thanks. This is helpful – guyr79 May 06 '20 at 14:03
  • I reopened your question and added an answer, although I don't feel I've really added anything much over the other question I linked yours with. Hopefully it helps. If it still doesn't answer your question, please edit the question to explain why it doesn't. – ProgrammingLlama May 06 '20 at 14:11
  • I have a command line unzip replacement in the works that does a recursive unzip to any depth. The program works by unzipping each layer in streaming mode, so it doesn't need to use temporary files or store the whole zip files in memory. This technique works because although a zip file contains a Toc at the end of the file, it is also possible to walk the zip file sequentially, processing each member as it encounters it. – pmqs May 06 '20 at 14:29

1 Answers1

2

As I pointed to in the question I marked yours basically as a duplicate off, you need to open the inner zip file.

I'd change your "open from file" method to be like this:

// Open ZipArchive from a file
public bool findPng(zipPath) {
    using (ZipArchive archive = ZipFile.OpenRead(zipPath))    
    {
        return findPng(archive);
    }
}

And then have a separate method that takes a ZipArchive so that you can call it recursively by opening the entry as a Stream as demonstrated here

// Search ZipArchive for PNG
public bool findPng(ZipArchive archive)
{
    foreach (ZipArchiveEntry entry in archive.Entries)
    {
         string s = entry.FullName;
         if (s.EndsWith(".zip")) 
         {
             // Open inner zip and pass to same method
             using (ZipArchive innerArchive = new ZipArchive(entry.Open()))
             {
                 if (findPng(innerArchive))
                    return true;
             }
         }
         /* same as above with folders within the zip */

         if(s.EndsWith(".png"))
           return true;
       }
       return false;
    }
}

As an optimisation, I would recommend checking all of the filenames before handling nested zip files.

ProgrammingLlama
  • 36,677
  • 7
  • 67
  • 86
  • Hi @john. Thank you for the extensive help. I will look into your code and see if it works for me – guyr79 May 06 '20 at 14:18
  • Hi Again @john. Wanted to let you know that your code works perfectly and helped me a lot (only needed to add "static" to the function declarations). Thank you very much for the care and extensive help!!! – guyr79 May 07 '20 at 14:12