11

Is there a generally-accepted fastest technique which is used to read a file into memory in c++?

I will only be reading the file.

I have seen boost have an implementation and I have seen a couple other implementations on here but I would like to know what is considered the fastest?

Thank you in advance

In case it matters, I am considering files up to 1GB and this is for windows.

John Dibling
  • 99,718
  • 31
  • 186
  • 324
intrigued_66
  • 16,082
  • 51
  • 118
  • 189
  • The fastest way is to read contiguous blocks whose size is aligned with the buffer of the disk (e.g. 8MB, if your disk has an 8MB buffer). – Kiril May 31 '12 at 15:32
  • 2
    Does it really matter? **Are you sure**? Have you profiled your code and proved that how long it takes to read the file is a problem? If so, you will probably need to use OS-specific facilities to get maximum performance. – John Dibling May 31 '12 at 15:34
  • 1
    It would help if you say what exactly you want to do with the file. – Shahbaz May 31 '12 at 15:44
  • possible duplicate of [What is the best way to slurp a file into a std::string in c++?](http://stackoverflow.com/questions/116038/what-is-the-best-way-to-slurp-a-file-into-a-stdstring-in-c) – Robᵩ May 31 '12 at 15:55

4 Answers4

5

Use memory-mapped files, maybe using the boost wrapper for portability.

If you want to read files bigger than the free, contiguous portion of your virtual address space, you can move the mapped portion of the file at will.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Will I have to pre-compile the header for that boost library or is it done for you? I remember some of the boost libraries require separate compilation? – intrigued_66 May 31 '12 at 16:28
  • @Porcupine: the documentation says "To manage mapped files, you just need to include the following header: `#include `", so I suppose it's a header-only library. – Matteo Italia May 31 '12 at 16:30
  • Ok, it's definitely a header-only library, as specified [here](http://www.boost.org/doc/libs/1_49_0/doc/html/interprocess.html#interprocess.intro.introduction_building_interprocess): "There is no need to compile Boost.Interprocess, since it's a header only library. Just include your Boost header directory in your compiler include path." – Matteo Italia May 31 '12 at 16:32
3

Consider using Memory-Mapped Files for your case, as the files can be upto 1 GB size.

And here you can start with win32 API:

There are several other helpful API on MSDN page.

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • is this just istream? is istream what i want? – intrigued_66 May 31 '12 at 16:48
  • @Porcupine: it is not `istream`. Go through the link to know what it is. If I explain it in few words, then it will not do anything good to you, and possibly it could be inaccurate. – Nawaz May 31 '12 at 17:27
1

In the event memory-mapped files are not adequate for your application, and file I/O is your bottleneck, using an I/O completion port to handle async I/O on the files will be the fastest you can get on Windows.

I/O completion ports provide an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system. When a process creates an I/O completion port, the system creates an associated queue object for requests whose sole purpose is to service these requests. Processes that handle many concurrent asynchronous I/O requests can do so more quickly and efficiently by using I/O completion ports in conjunction with a pre-allocated thread pool than by creating threads at the time they receive an I/O request.

Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
  • 1
    But reading a single file doesn't really sound to me like »many concurrent asynchronous I/O requests«. – Joey May 31 '12 at 16:00
  • Agreed, if this is for a single sub-1GB file it's overkill. I don't want to assume what OP wants, based on the ambiguous text of the q. – Steve Townsend May 31 '12 at 16:06
0

Generally speaking, mmap it is. But n Windows they have invented their own way of doing this, see "File Mapping". Boost has Memory-Mapped Files library that wraps both ways under a portable pile of code.

Also, you have to optimize for your use-case if you want to be fast. Just mapping file contents into memory is not enough. It is indeed possible that you don't need memory mapped files and could better off using async file I/O, for example. There are many solutions for many problems.