3

I have uploaded a zip file compressed with 7-zip option add to .zip containing only a file with the name text.txt into GitHub, how I could read the content of the file text.txt without writing it to disk?

I'm downloading the zip to memory using curl:

    #include <curl/curl.h>

    static size_t WriteMemoryCallback(void* contents, size_t size, size_t nmemb,
                                      void* userp) {
        size_t realsize = size * nmemb;
        auto& mem = *static_cast<std::string*>(userp);
        mem.append(static_cast<char*>(contents), realsize);
        return realsize;
    }
    
    std::string Download(const std::string& url) 
    {
        CURL* curl_handle;
        CURLcode res;
    
        std::string chunk;
    
        curl_global_init(CURL_GLOBAL_ALL);
    
        curl_handle = curl_easy_init();
        curl_easy_setopt(curl_handle, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
        curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &chunk);
        curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "libcurl-agent/1.0");
    
        // added options that may be required
        curl_easy_setopt(curl_handle, CURLOPT_FOLLOWLOCATION, 1L);  // redirects
        curl_easy_setopt(curl_handle, CURLOPT_HTTPPROXYTUNNEL, 1L); // corp. proxies etc.
        curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1L); // we want it all
        // curl_easy_setopt(curl_handle, CURLOPT_REDIR_PROTOCOLS, CURLPROTO_HTTP | CURLPROTO_HTTPS);
    
        res = curl_easy_perform(curl_handle);
    
        if(res != CURLE_OK) {
            std::cerr << "curl_easy_perform() failed: " << curl_easy_strerror(res) << '\n';
        } else {
            std::cout << chunk.size() << " bytes retrieved\n";
        }
    
        curl_easy_cleanup(curl_handle);
        curl_global_cleanup();
    
        return chunk;
    }

int _tmain(int argc, _TCHAR* argv[])
{
    std::string link = "https://github.com/.../main/text.zip";
    auto data = Download(link);
}

While searching for a lib capable of uncompressing the zip on memory, I found this one: libzip (any lib is welcome).

Searching for examples I found this answer, however he's loading a zip from disk to memory and reading it content.

How I could read the zip downloaded with curl that is on the string data?

When I visualize the content of data in the debugger it shows PK, I tried passing it to zip *z, but z returns null

        //Open the ZIP archive
        int err = 0;
        zip *z = zip_open(data.c_str(), 0, &err);
    
        //Search for the file of given name
        const char *name = "text.txt";
        struct zip_stat st;
        zip_stat_init(&st);
        zip_stat(z, name, 0, &st);
    
        //Alloc memory for its uncompressed contents
        char *contents = new char[st.size];
    
        //Read the compressed file
        zip_file *f = zip_fopen(z, name, 0);
        zip_fread(f, contents, st.size);
        zip_fclose(f);
Cesar
  • 41
  • 2
  • 5
  • 16
  • zlib does just about anything, and if I remember correctly, unzipping a file that's a blob of memory's in there. – user4581301 Sep 12 '22 at 20:55
  • 1
    I'm not familiar with ziplib, but [`zip_open_from_source`](https://libzip.org/documentation/zip_open_from_source.html) with [`zip_source_buffer`](https://libzip.org/documentation/zip_source_buffer.html) look like what you want just from reading the docs. – Miles Budnek Sep 12 '22 at 20:56
  • 1
    @user4581301 its not zlib... – Cesar Sep 12 '22 at 20:58
  • 2
    RAR is not the same format as Zip, it looks like you have the former, but the library you have chosen only supports the latter. – john Sep 12 '22 at 20:58
  • 2
    Does `curl` even have to do with the question? You have successfully downloaded the file and have it in memory, correct? If so, remove everything that has to do with `curl` from the question. – Ted Lyngmo Sep 12 '22 at 21:16
  • Daniel: If you save the data you have in memory (in `chunk` after `curl_easy_perform`) to disk and then use `unzip -t` on the file, does it display the content properly? Since it says `Rar!` I think you've created a `RAR` archive and not a `Zip` archive. – Ted Lyngmo Sep 12 '22 at 21:21
  • @TedLyngmo i uploaded a new zip compressed with `7.zip` option "add to zip" – Cesar Sep 12 '22 at 22:12
  • @Daniel Yes, that's a proper zip file. The previous was a RAR - I checked. So, if you save `chunk` to a file after `curl_easy_perform` and then do, from the command line, `unzip text.zip` , do you get `text.txt` with the content `hello world`? – Ted Lyngmo Sep 12 '22 at 22:42
  • @TedLyngmo i'm not sure what you mean, if i modify the function to save the content downloaded to disk, it is saved correctly, i can open and extract/read the text file, see: https://pastebin.com/H6Z1YUHL – Cesar Sep 12 '22 at 22:51
  • Yes, that's exactly what I mean. So if you can do that, then the `curl` part of your question isn't relevant, right? The real question is: _"How can I use `libzip` to read a zip archive stored in memory?"_. Agree? – Ted Lyngmo Sep 12 '22 at 22:54
  • Not really, i do think its relevant as its showing how the data where stored in the memory, and it can be any lib not necessary `libzip` – Cesar Sep 12 '22 at 22:57
  • @Daniel No, the data stored in memory is just Zip data that we verified is valid. How it got there is of no relevance. – Ted Lyngmo Sep 12 '22 at 23:01
  • @TedLyngmo all this rage just because he add the tag `curl`? lmao – Cesar Sep 12 '22 at 23:05
  • @Nathan :-) No, not the tag. It's the question itself that consists mostly of `curl` code. All of it should be removed. – Ted Lyngmo Sep 12 '22 at 23:06
  • @Daniel Please let me know if you have any problems with the steps in my answer. I tested it on both your single-file archive and some more complex archives containing multiple files and directories and it worked fine. – Ted Lyngmo Sep 12 '22 at 23:48
  • 1
    @TedLyngmo Thank you! i was not getting it to work reading an `exe` but i found another question in witch for my surprise you also answered, with helped me getting it!! – Cesar Sep 13 '22 at 03:08
  • @Daniel You're welcome - and what a coincidence. :-) – Ted Lyngmo Sep 13 '22 at 06:52

1 Answers1

2

I'm ignoring everything about in the question because we've verified that you've got the zip file stored in memory correctly.

How I could read the zip ... that is on the string data?

Since you have the whole file stored in memory, you need to create a zip_source from chunk.data() and open the archive using that zip_source - and then open the individual files in the archive.

Here's how (without error checking - you need to add that):

{
    // ...
    zip_error_t ze; // for errors

    // create a zip_source from the data you've stored in memory
    zip_source_t* zs = zip_source_buffer_create(chunk.data(), chunk.size(), 0, &ze);

    // open the archive from the zip_source
    zip_t* zip = zip_open_from_source(zs, ZIP_CHECKCONS | ZIP_RDONLY, &ze);

    // read how many files you've got in there
    zip_int64_t entries = zip_get_num_entries(zip, 0);

    std::cout << entries << '\n';

    // loop over the entries in the archive
    for(zip_int64_t idx = 0; idx < entries; ++idx) {
        std::cout << zip_get_name(zip, idx, ZIP_FL_ENC_STRICT) << '\n';

        // open the file at this index
        zip_file_t* fp = zip_fopen_index(zip, idx, 0);

        // process the file
        zip_int64_t len;
        char buf[1024];
        while((len = zip_fread(fp, buf, sizeof buf)) > 0) {
            std::cout << "read " << len << " bytes\n";
            // do something with the `len` bytes you have in `buf`
        }
        zip_fclose(fp); // close this file
    }
    zip_close(zip); // close the whole archive
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108