0

In my code I have a file database httpcache.db that my application reads into memory and writes it back to disk in a single read/write FILE i/o api call. I read/write it using plain fopen/fread. I decided to compress it and see if reduction in size results in overall speed up. At first, I measured load time it take take to read httpcache.db and I got fairly consistent times of 350 microseconds. Size of httpcache.db file is 500KB. Then I simply zipped it (httpcache.db.zip size became 24KB) and tried to measure time it takes to read the zipped file. Zipped file takes 90 microseconds. However, according to my measurement it would take roughly 1000 microseconds to unzip this file (with would total to 1090 micro vs 350 micro).

Then I tried to use lz4 compressor instead. Compressed size became 40KB. With lz4, however, it would take only 80 microseconds to decompress my original httpcache.db. It looked over like a win: 90+80 microseconds vs 350 before lz4 compression. Just to make sure everything was ok, I made a final run to verify numbers and to my surprise loading compressed 40KB file would take the same amount of time as it took original 500KB uncompressed file. I checked everything and I didn't find issues with my code: somehow loading 40KB or 500KB file would take 350-400 microseconds, while 24KB file would take 90. The only difference (other than file size) was filename/extension. I simply renamed lz4 compressed file from httpcache.db to httpcache.zip and to my surprise simply changing file extension suddenly "boosted" file i/o by 200%: loading 40KB httpcache.zip file would take 90 microseconds as expected.

After trying different things, it seems that I get this slow reading if extension of the file is .db or .bin, and fast io if extension is .zip, .txt or no extension at all.

Clearly, windows messes up somehow file io based on file extension (I use latest Win10 pro running in bootcamp on 2020 macbook pro 16). I disabled antivirus for the folder where the file is located and still got the same results. Any ideas what's going on and why file's extension affects file io this much?

This is the code I run to measure:

int main()
{
    std::string fdataZip, fdata;
    {
        static const char dbName[] = "../data/httpcache.db.zip"; // 24KB
        auto t0 = timeMicro();
        readFile(dbName, fdataZip);
        auto t1 = timeMicro();
        LOG("%s load time: %lld micro", dbName, t1 - t0);
    }

    {
        static const char dbName[] = "../data/httpcache.db"; // 40 KB
        auto t0 = timeMicro();
        readFile(dbName, fdata);
        auto t1 = timeMicro();
        LOG("%s load time: %lld micro", dbName, t1 - t0);
    }
}

and readFile is:

void readFile(const char* fileName, std::string& fileData)
{
    fileData.clear();
    if (FILE* fl = fopen(fileName, "rb"))
    {
        fseek(fl, 0, SEEK_END);
        long length = ftell(fl);
        fseek(fl, 0, SEEK_SET);
        if (length > 0)
        {
            fileData.resize(length);
            (void)fread(&fileData[0], 1, length, fl);
        }
        fclose(fl);
    }
}

timeMicro is implemented using QPC clock.

Output from a sample run that I get:

   0:000 ... start
   0:002 ../data/httpcache.db.zip load time: 97 micro
   0:003 ../data/httpcache.db load time: 450 micro
Pavel P
  • 15,789
  • 11
  • 79
  • 128
  • 2
    "Clearly, windows messes up somehow file io based on file extension" - that's not a conclusion I think is obvious/clear based on what you've posted. – Jesper Juhl Mar 28 '20 at 18:17
  • 1
    You are best off posting the minimal amount of code that can point to the issue. Without code it's just hard to point at anything, really. I am not saying Windows doesn't do what you seem to think it does, but there tends to be an explanation. – Armen Michaeli Mar 28 '20 at 18:19
  • @JesperJuhl gist of what I wrote is: changing filename extension affects speed to read a file. – Pavel P Mar 28 '20 at 18:27
  • @amn sample code is added – Pavel P Mar 28 '20 at 18:28
  • 1
    I tried to disable AV and this eliminated the problem. – Pavel P Mar 28 '20 at 18:36

1 Answers1

4

It looks like windows defender is the culprit. Even though all my work is located in a folder that I added to exclusion lists in antivirus settings, and even though I tried to add this httpcache.db to exclusions it still did not make a difference until I tried to turn off real time protection:

enter image description here

after that, file's extension wouldn't affect file io speed:

   0:000 ... start
   0:002 ../data/httpcache.db.zip load time: 89 micro
   0:002 ../data/httpcache.db load time: 97 micro

In my case, renaming my file to httpcache would avoid issues with windows's AV, which is the solution of the problem. Kind of weird that file extension may affect that.

Pavel P
  • 15,789
  • 11
  • 79
  • 128