1

I would like to read a large file that has a structure similar to the following:

        John  10  department
        Hello 14   kjezlkjzlkj
        jhfekh 144 lkjzlkjrzlj
        ........

The problem is I want to minimize the number of I/O access to the disk while reading this file in C++. Is there a way to access the file on Disk, then read a large portion of the file to memory ( that 1 disk access), then read a second large portion of the file ( 2nd disk access...Etc)?

Any help will be appreciated.

John
  • 627
  • 10
  • 18
  • I tried reading the file line by line, but I think reading every line will lead a disk access, right? – John Dec 02 '12 at 13:52
  • Depends on the method you're using to read the line, I suspect. – Will A Dec 02 '12 at 13:53
  • @John, "reading every line will lead a disk access" - no. The OS will cache files and buffer reads, especially sequential ones. It is unlikely you will see any significant speedup by simply adding buffering in the program. You would have to change your parsing algorithm considerably to take advantage of a buffer. – Tino Didriksen Dec 02 '12 at 14:04

3 Answers3

3

Just create a large buffer and fill it up with one read. Repeat if necessary.

The streams (stdio) implements this. You can use fopen and then use setbuffer

EDIT

It is rather simple

   /* 5MB - Can increase or decrease this to your hearts content */
   #define BUFFER_SIZE 5242880

   char buffer[BUFFER_SIZE];
   file = fopen("filename", "r");
   setbuffer(file, buffer, BUFFER_SIZE);

Then use any of the operations to read fscanf, fgets etc.

EDIT

Sorry did not notice it was C++

Here is the code for C++

#include <iostream>
#include <fstream>
using namespace std;

...

const int BUFFER_SIZE = 5242880;

filebuf fb;
char buffer[BUFFER_SIZE];
fb.setbuf(buffer, BUFFER_SIZE);
fb.open ("test.txt",ios::in);
istream is(&fb);

Then can use int i; is >> i

etc

Happy now Tino Didriksen

Ed Heal
  • 59,252
  • 17
  • 87
  • 127
2

In a C++ iostream, you can increase the buffer with rdbuf and pubsetbuf

ifstream f;
char buf[4096];
f.rdbuf()->pubsetbuf(buf, sizeof(buf));
Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198
0

It depends upon the operating system. First, you may want to use large buffers. See this question. (And it also depends if the reading is sequential).

Or you could use lower-level system calls, like mmap on Linux or Posix. (or at least, read with large megabyte sized buffers).

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547