4

I want to load one million random integers from a .txt into a vector using the command line:

program.exe < million-integers.txt

My code below works, but takes several seconds to run. Is there something I can do to make it faster? I've found some solutions on SO, but they all seem to relying on hard-coding the filepath. I want to be able to pass a filename through the command line.

vector<int> data;
int input;

while (cin >> input)
{
    data.push_back(input);
}
cout << "Data loaded." << endl;

(C++ noob using Visual Studio on Win 8.1)

Edit: In this case, I know that some improvement can be made since I have someone else's .exe that can do it in under a second.

Edit: All integers are on the same line.

Jim V
  • 1,131
  • 12
  • 22
  • 2
    How did you build the program? Did you do it in release mode, with all optimisations turned on? – Lightness Races in Orbit May 30 '14 at 17:40
  • 2
    You should consider [`reserve`](http://www.cplusplus.com/reference/vector/vector/reserve/)ing space for the values in your `std::vector`. Otherwise, it's going to be resizing and copying elements over and over again, like [Schlemiel](http://en.wikipedia.org/wiki/Schlemiel_the_painter%27s_Algorithm#Schlemiel_the_Painter.27s_algorithm). – Lilshieste May 30 '14 at 17:42
  • @Lilshieste: At most, `reserve()` will save 50% of the execution time, and I seriously doubt it will more than a couple percent. – Dietrich Epp May 30 '14 at 17:45
  • @Lilshieste: Good point. I didn't think of that. – Lightness Races in Orbit May 30 '14 at 17:46
  • @DietrichEpp: Where did you come up with that 50% figure? – Lightness Races in Orbit May 30 '14 at 17:46
  • I used "Build Solution" with all default settings. – Jim V May 30 '14 at 17:51
  • 1
    @LightnessRacesinOrbit: I did the math wrong, it would be 67%. Count memory bandwidth. To populate the vector, you need to do at least one read and one write for every element. To resize the vector, the same thing applies. The worst-case number of copies due to resize is less than 2 per element. I had wrongly thought it was 1 per element. – Dietrich Epp May 30 '14 at 17:52
  • 1
    @DietrichEpp: Number of copies != Time taken to perform copies – Lightness Races in Orbit May 30 '14 at 17:57
  • 1
    @LightnessRacesinOrbit: It's an estimate. I think something fishy is going on with the C++ implementation that's making it slow. – Dietrich Epp May 30 '14 at 17:58
  • @JimV my answer has been down voted into oblivion but I have posted a gist of how I managed to make it 10 times faster. I hope this is the solution you were looking for. – Spundun May 30 '14 at 18:13
  • @DietrichEpp: Yeah there's something odd going on here. – Lightness Races in Orbit May 30 '14 at 18:22
  • @JimV You want to accept any of the answers? Were you able to resolve your issue? – Spundun May 31 '14 at 19:37
  • @Spundun At this point I don't understand enough to implement half of these, but I'll keep at it. For now I just accepted the more in depth of the highest voted answers. – Jim V Jun 01 '14 at 21:03

5 Answers5

7

Run time: 4.08s. What? That's slow!

Why is this happening?

I did profiling. I'm using a very different system: OS X 10.8, with Clang, but my program is also slow, and I suspect it is for the same reason. Here are two lines from the profiling results (apologies for formatting):

Running Time    Self        Symbol Name
3389.0ms   79.3%    76.0             std::__1::num_get<char, std::__1::istreambuf_iterator<char, std::__1::char_traits<char> > >::do_get(std::__1::istreambuf_iterator<char, std::__1::char_traits<char> >, std::__1::istreambuf_iterator<char, std::__1::char_traits<char> >, std::__1::ios_base&, unsigned int&, long&) const
824.0ms   19.2% 8.0          std::__1::basic_istream<char, std::__1::char_traits<char> >::sentry::sentry(std::__1::basic_istream<char, std::__1::char_traits<char> >&, bool)

As you can see, these two functions account for almost 98.5% of the execution time. Wow! When I drill down, what are these library functions calling that takes so much time?

  • flockfile()
  • funlockfile()
  • pthread_mutex_unlock()

So, on my system, the implementation for std::cin works with C's <stdio.h> functions so they can both be used in the same program, and these functions make sure to synchronize with other threads. This is inefficient.

  1. There is no code using <stdio.h>, so there is no need synchronize with it.

  2. There is only one thread using stdin, so locking is excessive, especially if you lock once per character read. That's super excessive. Locks and system calls are pretty fast... but if you do something like 10 million locks and system calls? No longer fast.

Note: Yes, I am running OS X, and the actual functions will be different on Windows. Instead of flockfile() and pthread_mutex_unlock() you will see whatever the Windows version is.

Solution #1

Stop using redirection. If you use ifstream, then it is assumed that you will take care of locking yourself. On my system, this give a runtime of 0.42 seconds—close to a factor of 10.

Solution #2

Read everything into a string, and then parse the string. This allows you to continue using redirection to read the file.

Solution #3

Disable locking on std::cin. Sorry folks, I don't know how to do that. It might be possible.

Performance limits

I suspect the ifstream version is nowhere near the performance limit of your computer. If performance were critical, I suspect you can get the warm-cache runtime close to 2 or 3 ms, when your program is only limited by memory bandwidth.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • In Xcode, press command-I, or select "Profile" from the Product menu. This will launch Instruments. I used the "time profiler". Make sure to set any necessary command-line arguments in advance (Product -> Scheme -> Edit Scheme -> Arguments Passed On Launch). – Dietrich Epp May 30 '14 at 20:02
  • Awesome! But I wonder how you redirected a file for profiling this way. – Spundun May 30 '14 at 20:41
  • @Spundun: I used `int fdes = open(path, O_RDONLY)` and then `dup2(fdes, 0)`, which is the same way Bash does it. It's a bit of a hack to do it after the program starts. – Dietrich Epp May 30 '14 at 20:54
  • pthread_mutex_lock done properly should not make any system calls unless there is contention. – doron May 31 '14 at 12:15
  • @droon: Right, as I said, they are library functions, and there is no contention. – Dietrich Epp May 31 '14 at 17:00
  • There is a sync_with_stdio option you can disable. http://msdn.microsoft.com/en-us/library/7yxhba01.aspx – Puppy Jun 02 '14 at 14:35
5

There are two things you might try.

  1. use std::vector<T>::reserve to size your vector up front
  2. use std::iostream::read to read the data in 4K (or bigger) blocks and parse the data yourself.

iostreams are supposed to be buffered and vectors are suppose to have a good resize strategy but sometimes these things do not work as well as expected.

doron
  • 27,972
  • 12
  • 65
  • 103
  • The main(90%) performance penalty comes from using cin instead of ifstream. Because cin tries to be thread safe. See my answer and Dietrich's answer. – Spundun May 31 '14 at 00:21
1

Use regular ifstream instead of cin. Pass the name of the file on command line (vargs) and open it, instead of using cin.

int main(int argc, char *argv[]){
  vector<int> data;
  int input;
  ifstream in;
  in.open(argv[1]);
 
  while (in >> input)
  {
        data.push_back(input);
  }
  cout << "Data loaded." << endl;
}

@DietrichEpp shed more light on why my solution was 10 times faster. When you use cin, it tries to be thread safe and adds extra synchronization overhead making it run much slower.


I have verified that my solution works and it runs 10 times faster. I'm using MacBook Pro (2012 I think) with a flash drive

cpp $ruby gen_data.rb > data.txt 
cpp $c++ -O3 test1.cpp 
cpp $time ./a.out < data.txt 
Data loaded.

real    0m3.048s
user    0m3.023s
sys 0m0.011s
cpp $c++ -O3 test2.cpp 
cpp $time ./a.out data.txt 
data.txtData loaded.

real    0m0.351s
user    0m0.334s
sys 0m0.010s
miken32
  • 42,008
  • 16
  • 111
  • 154
Spundun
  • 3,936
  • 2
  • 23
  • 36
  • 4
    This is not how file redirection works. When you use shell redirection with bash, the standard input file descriptor is just a regular file—which is *exactly* the same thing you get if you open the file using a path. You can even seek on it! There is *zero* overhead. I suggest reading about how `fork()` and `exec()` work with respect to file descriptors, although the question is on Windows 8.1, which makes everything different. – Dietrich Epp May 30 '14 at 17:29
  • @DietrichEpp Ok, as hard as it was to receive criticism. At first I thought maybe you are right, after all what I had said was an educated guess, I don't have all the bash, fork, exec, pipe and filesystem implementation at the tip of my tongue. But fortunately I took the time to verify my answer, and I'm actually getting around 90% gain in time. Please verify this. I think mine is the right answer. – Spundun May 30 '14 at 18:12
  • 2
    You have done some research, but the difference here is between `ifstream` and `std::cin`, and it has nothing to do with Bash. Your explanation for *why* is still wrong. – Dietrich Epp May 30 '14 at 18:14
  • @DietrichEpp I'm not claiming that bash is at fault, I'm just explaining how the data is ending up in the program in two scenarios. I did say that he is using stdin and pipe instead of straight file io. While I may not have explained in detail where the performance bottelneck occures, I don't think I led him in the wrong direction. I meant to tell him that there might be overhead associated with this system because it has more components. – Spundun May 30 '14 at 18:20
  • Are you sure this isn't the effect of file system caching? Run the same test but in the opposite order. – Blastfurnace May 30 '14 at 18:20
  • @Blastfurnace I have run the tests 3 times already but I'll do what you said for completeness – Spundun May 30 '14 at 18:20
  • @Blastfurnace still the same result – Spundun May 30 '14 at 18:21
  • @DietrichEpp I have updated my explanation to hopefully clarify what I'm trying to say. What does everyone think now? – Spundun May 30 '14 at 18:25
  • Try putting `std::cin.sync_with_stdio(false);` at the top of main in `test1.cpp`. – Blastfurnace May 30 '14 at 18:25
  • @Blastfurnace still same numbers. – Spundun May 30 '14 at 18:27
  • @Spundun: The problem with your explanation is that you say there is a pipe. This is wrong, there is no pipe. When you redirect a file using `<`, you get a handle to the file directly, and no pipe is created. There is *zero* communication between bash and the program. – Dietrich Epp May 30 '14 at 18:40
  • @DietrichEpp if that is correct then you are right, I need to fix my answer. is there anywhere this is documented? – Spundun May 30 '14 at 18:42
  • I can elaborate. When bash uses `<`, it opens the file. Then when it forks, that file descriptor is passed to the child process and with `dup2()` becomes file descriptor #0, and then bash uses `exec()` to start your program. When you use `ifstream`, you get a different file descriptor (probably #3), and you open it after `exec()`, but it's still the exact same kind of file descriptor, with the exact same system calls used to read it, and it reads directly from the file with no help from bash. You could read a Unix programming book, any good Unix programming book explains this. – Dietrich Epp May 30 '14 at 18:45
  • @DietrichEpp Thanks, for the support. I understand your explanation. It does sound like a good way to implement the bash < feature. So that's why you think the performance improvement is coming solely from cin specific quirks? – Spundun May 30 '14 at 19:25
  • 1
    @Spundun: Yes, according to my profiling, most of the performance penalty is because `std::cin` is built to work with multiple threads, but since there is only one thread working with `std::cin`, that effort is wasted. – Dietrich Epp May 30 '14 at 19:31
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/54818/discussion-between-spundun-and-dietrich-epp). – Spundun May 30 '14 at 22:29
1

Another option you may want to look into is ios_base::sync_with_stdio, which defaults to true. This options forces std::cin to read each block as a separate chunk in order to allow you to intertwine C operations with C++. This slows down reading a lot. A more detailed analysis can be found in this answer.

Community
  • 1
  • 1
Svalorzen
  • 5,353
  • 3
  • 30
  • 54
0

Yes, and No. Yes, there is a faster way to retrieve the data once the OS gives it to you. No, your program is at the mercy of the OS and the OS will send data from the file to your program as the OS chooses (There may be other tasks that can interrupt your program from receiving data continuously).

Read large amounts into memory
Memory is always faster to search than a device, such as a thumb drive or hard drive.

One read request for a large amount of data is usually much faster than many read requests for a small amount of data.

Combining these facts states that you should allocate (reserve) a large amount of memory and read into the memory. This may need to be repeated depending on the amount of memory you can reserve and the size of the file.

Example:

  const unsigned int BUFFER_SIZE = 1024 * 1024;
  char buffer[BUFFER_SIZE];
  cin.read(buffer, BUFFER_SIZE);

Use std::istringstream
You can create an std::istringstream that uses the buffer. Thus you can "read" your integers from the text buffer as you would from cin or a text file.

Overhead of file redirection
When you redirect the output of a file to the input of your program, the OS must open the file, read some data from the file, then pass it to your program. This overhead is the principle bottleneck of your program. If your program would open the file directly, you could avoid the overhead and make your program run faster.

Search the web for "C++ cin rdbuf setbuf" for information on redirecting cin to use a buffer of your choosing.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Are you sure that redirection has overhead? At least on Unix systems, redirection has zero overhead whatsoever. – Dietrich Epp May 30 '14 at 19:14
  • @DietrichEpp: So what you are saying is in Unix systems, the program opens the file directly when using input redirection? My understanding is that the OS needs to switch the `cin` driver to use a file reading driver rather than a keyboard reading driver. My understanding your comment is that this file reading driver is the same driver used when a program opens and reads a file. There is still overhead because of the switching to the file reading driver and reverting back to the keyboard reading driver. – Thomas Matthews May 30 '14 at 19:23
  • Any good book on Unix programming has the details, but here's the summary. Forget about drivers, those are in the kernel and irrelevant here. Open files are identified by number, and #0 is standard input. You can put any file in slot #0, such as a handle to a terminal window, a regular file, or a network socket: to the program, they're all just file handles. In the terminal, normally #0 is a pipe hooked up to the terminal program. When you use `<` in bash, instead, #0 is just a regular file. – Dietrich Epp May 30 '14 at 19:27
  • If you open a file from within a program, the only difference is that the file number will be different, and *only* because file #0 is already in use. So if you call `fopen()` or `open()` or create an `ifstream`, you probably get file #3 or #4 instead. But it is still exactly the same kind of file as if you use `<` in bash: just a regular file, not a pipe. – Dietrich Epp May 30 '14 at 19:29