0

This code download deflated XML document https://api.bilibili.com/x/v1/dm/list.so?oid=162677333 and save it to temp.Z, which however seems broken. How is that?

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <wininet.h>

#pragma comment(linker, "/entry:\"mainCRTStartup\"")
#pragma comment(lib, "wininet.lib")

char *download(char *link, int *size)
{
    int prealloc_size = 100000;
    char *buf = malloc(prealloc_size);
    DWORD num;
    HINTERNET hinet;
    HINTERNET hurl;
    *size = 0;

    hinet = InternetOpen("Microsoft Internet Explorer",
        INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, INTERNET_INVALID_PORT_NUMBER);
    hurl = InternetOpenUrl(hinet, link, NULL, 0, INTERNET_FLAG_NEED_FILE, 0);
 
    while (TRUE == InternetReadFile(hurl, buf + *size, 1024, &num) && num > 0)
    {
        *size += num;
        
        if (*size + 1024 > prealloc_size)
        {
            prealloc_size += prealloc_size / 2;
            buf = realloc(buf, prealloc_size);
        }
    }
    
    InternetCloseHandle(hurl);
    InternetCloseHandle(hinet);
    return buf;
}

int main(void) 
{
    char *link = "https://api.bilibili.com/x/v1/dm/list.so?oid=162677333";
    FILE *f = fopen("temp.Z", "wb");
    int siz;
    char *dat = download(link, &siz);
    
    fwrite(dat, 1, siz, f); 
    fclose(f);
    free(dat);
    
    return 0;
}

I tried Fiddler and it gets the same data, however, Fiddler can decode it, and says it is deflate.

user26742873
  • 919
  • 6
  • 21
  • 1
    The file I got from your program was just the same as a file obtained via `wget` from the URL. It seems the file is broken on server or you don't know how to deal with the file correctly. – MikeCAT Jul 28 '20 at 13:44
  • @MikeCAT Thank you. It can be handled properly in the browser, maybe there's some web tricks. – user26742873 Jul 28 '20 at 13:46
  • @MikeCAT Your help makes great sense!!!! Thank you!! It is something between deflate, zlib and gzip. I don't know. But I can decode it now. – user26742873 Jul 28 '20 at 18:03
  • This is an old topic. See [one SO post](https://stackoverflow.com/questions/3932117/handling-http-contentencoding-deflate) and [this one](http://www.mail-archive.com/www-talk@w3.org/msg01000.html). Raw deflate doesn't have a header, so we don't know what the hell is that. I have suggested an archive manager developer to add this support, so later on we can just try to open it with the archive manager. – user26742873 Aug 07 '20 at 16:39

1 Answers1

1

It is something between deflate, zlib and gzip. I don't know. But I can decode it now.

Just use zlib, with inflateInit2(&strm, -MAX_WBITS) instead of inflateInit(&strm).

Yes, it is totally good. But why did I think it broken? Because my archive manager don't decode this! Anyway, I need to call zlib by my own. I have suggested the archive manager developers add this feature - which is useful, no?

user26742873
  • 919
  • 6
  • 21