I am starting a small project for a key-value store, in C++. I am wondering how C++ std streams compare to mmap in terms of scalability and performance. How does using ifstream::seekg on a file that wouldn't fit in RAM compare to using mmap/lseek?
-
2Why don't you make a small test and see? Anyways, there're a lot of variables like portability, distribution, actual problem to solve and so on. – edmz Nov 29 '15 at 15:37
-
1What kind of data? What size? What computer? – Basile Starynkevitch Nov 29 '15 at 15:46
-
1This is basically covered in http://stackoverflow.com/questions/5588605/mmap-vs-read although iostreams introduce additional overhead to read. – vitaut Nov 29 '15 at 15:58
1 Answers
Ultimately, any Linux user-land application is using syscalls(2), including the C++ I/O library.
With great care, mmap
and madvise
(or lseek
+ read
& posix_fadvise
) could be more efficient that C++ streams (which are using read
and other syscalls(2)...); but a misuse of syscalls (e.g. read
-ing too small buffer) can give catastrophic performance
Also, Linux has a very good page cache (used to contain parts of recently accessed file data). And performance also depends upon the file system (and the hardware -SSD and mechanical hard disks are different beasts- and computer).
Maybe you should not reinvent your own thing and use sqlite, or gdbm, or redis, or mongodb, or postgresql, or memcached, etc...
Performance and trade-offs depend strongly on the actual use (a single 4Gbytes log file on your laptop is not the same as petabytes of video or genomics data in a datacenter). So benchmark (and notice that many tools like the ones I mentioned can be tuned wisely).

- 223,805
- 18
- 296
- 547