1

Say I wanted to initialise a std::vector of objects e.g.

class Person { int ID; string name; ...}

from a file that contains a line for each object. One route, is to override operator>> and simply std::cin>>temp_person, another - which I used to favour is to use sscanf("%...", &...) a bunch of temporary primitive types and simply .emplace_back(Person(temp_primitives...).

Which way achieves the quickest runtime ignoring memory footprint? Is there any point in mmap()ing the entire file?

Alex Petrosyan
  • 473
  • 2
  • 20
  • *Which way achieves the quickest runtime ignoring memory footprint?* `sscanf/fscanf` is probably tad faster but it might not be sigfnificant. Give it a shot and find out the difference. – R Sahu Aug 08 '18 at 22:11
  • 1
    "*Which way achieves the quickest runtime ignoring memory footprint?*" - profile the code and find out for yourself – Remy Lebeau Aug 08 '18 at 22:16
  • 1
    Since you asked about `mmap` performance, see my answers: https://stackoverflow.com/questions/37172740/how-does-mmap-improve-file-reading-speed/37173063#37173063 and more to the point: https://stackoverflow.com/questions/33616284/read-line-by-line-in-the-most-efficient-way-platform-specific/33620968#33620968 – Craig Estey Aug 08 '18 at 22:16
  • Overloading `operator>>` mostly determines the syntax for calling it. If you want to implement that using `scanf`, that's pretty trivial to do. In other words, this isn't really an either/or situation. You can do one or both as you see fit. – Jerry Coffin Aug 08 '18 at 22:16
  • 4
    The `sscanf` does not have format specifiers for `std::string`. – Thomas Matthews Aug 08 '18 at 22:18
  • You can use `std::cin` and `>>` without overloading the `>>` operator. Just read into a temporary object and push it into the vector like you do with `sscanf`. Might as well stick with C++ I/O if you don't need to use C I/O. – eesiraed Aug 08 '18 at 22:19
  • 1
    Note: If you are using std::cin rather than a file. Remember to unlink the C++ buffer from the C buffer. That will improve performance. [`std::ios::sync_with_stdio(false);`](https://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio) – Martin York Aug 08 '18 at 22:23
  • @MartinYork, unfortunately I do need to mix C/C++ io, otherwise it’s sound advice. – Alex Petrosyan Aug 09 '18 at 11:08
  • @AlexPetrosyan I question the "Need". You may happen to have C code mixed with your C++. But need is a strong word. You just don't want to pay the effort to re-factor it out. Which could be a valid choice. But mixing these completely different languages you will not get the full power of C++. Pick one language it makes things simpler in the long run. – Martin York Aug 09 '18 at 17:30
  • @MartinYork, I understand what you mean, but sometimes you simply can't refactor the code out, e.g. when you're using an external library and you're not allowed to touch it (my case). – Alex Petrosyan Aug 09 '18 at 17:36

1 Answers1

9

Since you are reading from a file, the performance is going to be I/O-bound. Almost no matter what you do in memory, the effect on the overall performance is not going to be detectable.

I would prefer the operator>> route, because this would let me use the input iterator idiom of C++:

std::istream_iterator<Person> eos;
std::istream_iterator<Person> iit(inputFile);
std::copy(iit, eos, std::back_inserter(person_vector));

or even

std::vector<Person>   person_vector(
    std::istream_iterator<Person>(inputFile)
,   std::istream_iterator<Person>()
);
Martin York
  • 257,169
  • 86
  • 333
  • 562
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Beat me to it. :-). But you can also use the `std::vector` constructor that takes two iterators: `std::vector person_vector(iit, eos);` – Martin York Aug 08 '18 at 22:20
  • @MartinYork Thank you very much, that makes perfect sense! – Sergey Kalinichenko Aug 08 '18 at 22:30
  • @AlexPetrosyan Sure, `std::istreambuf_iterator` should work as well. I am not sure what advantages you would get from operating on the "raw" character level, though. – Sergey Kalinichenko Aug 09 '18 at 11:01
  • @dasblinkenlight, I think this is the exact use case for buffered io. In Java, `bufferedReader` when used to read objects from standard input(s), will not construct temporary objects for assignment on a character by character, but rather object by object basis. In my case this leads to a performance bump, so I think it’s worthwhile. – Alex Petrosyan Aug 09 '18 at 11:04
  • 1
    @AlexPetrosyan In C++ the implementation is very different, so "raw characters" means something else. See [this answer](https://stackoverflow.com/a/34458016/335858) for a very detailed explanation of what's different (spoiler: not much). – Sergey Kalinichenko Aug 09 '18 at 11:20