In my code I have a file database httpcache.db
that my application reads into memory and writes it back to disk in a single read/write FILE i/o api call. I read/write it using plain fopen/fread. I decided to compress it and see if reduction in size results in overall speed up. At first, I measured load time it take take to read httpcache.db
and I got fairly consistent times of 350 microseconds. Size of httpcache.db
file is 500KB. Then I simply zipped it (httpcache.db.zip
size became 24KB) and tried to measure time it takes to read the zipped file. Zipped file takes 90 microseconds. However, according to my measurement it would take roughly 1000 microseconds to unzip this file (with would total to 1090 micro vs 350 micro).
Then I tried to use lz4 compressor instead. Compressed size became 40KB. With lz4, however, it would take only 80 microseconds to decompress my original httpcache.db. It looked over like a win: 90+80 microseconds vs 350 before lz4 compression. Just to make sure everything was ok, I made a final run to verify numbers and to my surprise loading compressed 40KB file would take the same amount of time as it took original 500KB uncompressed file. I checked everything and I didn't find issues with my code: somehow loading 40KB or 500KB file would take 350-400 microseconds, while 24KB file would take 90. The only difference (other than file size) was filename/extension. I simply renamed lz4 compressed file from httpcache.db to httpcache.zip and to my surprise simply changing file extension suddenly "boosted" file i/o by 200%: loading 40KB httpcache.zip
file would take 90 microseconds as expected.
After trying different things, it seems that I get this slow reading if extension of the file is .db
or .bin
, and fast io if extension is .zip
, .txt
or no extension at all.
Clearly, windows messes up somehow file io based on file extension (I use latest Win10 pro running in bootcamp on 2020 macbook pro 16). I disabled antivirus for the folder where the file is located and still got the same results. Any ideas what's going on and why file's extension affects file io this much?
This is the code I run to measure:
int main()
{
std::string fdataZip, fdata;
{
static const char dbName[] = "../data/httpcache.db.zip"; // 24KB
auto t0 = timeMicro();
readFile(dbName, fdataZip);
auto t1 = timeMicro();
LOG("%s load time: %lld micro", dbName, t1 - t0);
}
{
static const char dbName[] = "../data/httpcache.db"; // 40 KB
auto t0 = timeMicro();
readFile(dbName, fdata);
auto t1 = timeMicro();
LOG("%s load time: %lld micro", dbName, t1 - t0);
}
}
and readFile is:
void readFile(const char* fileName, std::string& fileData)
{
fileData.clear();
if (FILE* fl = fopen(fileName, "rb"))
{
fseek(fl, 0, SEEK_END);
long length = ftell(fl);
fseek(fl, 0, SEEK_SET);
if (length > 0)
{
fileData.resize(length);
(void)fread(&fileData[0], 1, length, fl);
}
fclose(fl);
}
}
timeMicro
is implemented using QPC clock.
Output from a sample run that I get:
0:000 ... start
0:002 ../data/httpcache.db.zip load time: 97 micro
0:003 ../data/httpcache.db load time: 450 micro