1
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace std;

vector<char> f1()
{
    ifstream fin{ "input.txt", ios::binary };
    return
    {
        istreambuf_iterator<char>(fin),
        istreambuf_iterator<char>()
    };
}

vector<char> f2()
{
    vector<char> coll;
    ifstream fin{ "input.txt", ios::binary };
    char buf[1024];
    while (fin.read(buf, sizeof(buf)))
    {
        copy(begin(buf), end(buf),
            back_inserter(coll));
    }

    copy(begin(buf), begin(buf) + fin.gcount(),
        back_inserter(coll));

    return coll;
}

int main()
{
    f1();
    f2();
}

Obviously, f1() is more concise than f2(); so I prefer f1() to f2(). However, I worry that f1() is less efficient than f2().

So, my question is:

Will the mainstream C++ compilers optimize f1() to make it as fast as f2()?

Update:

I have used a file of 130M to test in release mode (Visual Studio 2015 with Clang 3.8):

f1() takes 1614 ms, while f2() takes 616 ms.

f2() is faster than f1().

What a sad result!

xmllmx
  • 39,765
  • 26
  • 162
  • 323
  • Which is faster? - should be measured. One thing which comes to mind is that it would be good to `reserve` the required memory for the `vector` to avoid reallocations – Dusteh Dec 14 '16 at 10:35
  • Also, it may be worth considering using a rope, does not directly concern choice of input library, but anyway: http://stackoverflow.com/questions/2826431/stl-rope-when-and-where-to-use – Erik Alapää Dec 14 '16 at 11:43

2 Answers2

3

I've checked your code on my side using with mingw482. Out of curiosity I've added an additional function f3 with the following implementation:

inline vector<char> f3()
{
    ifstream fin{ filepath, ios::binary };
    fin.seekg (0, fin.end);
    size_t len = fin.tellg();
    fin.seekg (0, fin.beg);

    vector<char> coll(len);
    fin.read(coll.data(), len);
    return coll;
}

I've tested using a file ~90M long. For my platform the results were a bit different than for you.

  • f1() ~850ms
  • f2() ~600ms
  • f3() ~70ms

The results were calculated as mean of 10 consecutive file reads.

The f3 function takes the least time since at vector<char> coll(len); it has all the required memory allocated and no further reallocations need to be done. As to the back_inserter it requires the type to have push_back member function. Which for vector does the reallocation when capacity is exceeded. As described in docs:

push_back

This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.

Among f1 and f2 implementations the latter is slightly faster although both use the back_inserter. The f2 is probably faster since it reads the file in chunks which allows some buffering to take place.

Dusteh
  • 1,496
  • 16
  • 21
  • My observation using this approach is that, yes the memory for coll is updated, but vector container is not aware of any of the changes. if you ask that vector its size, it will report zero. – Arcin B Jun 11 '19 at 14:35
-1

If smaller than some GB you can read all at once:

#include "sys/stat.h"
        ....

char* buf;
FILE* fin;
filename="myfile.cgt";
#ifdef WIN32
   struct stat st;
  if (stat(filename, &st) == -1) return 0;
#else
    struct _stat st;
if (_stat(filename, &st) == -1) return 0;
#endif
    fin = fopen(filename, "rb");
    if (!fin) return 0;
    buf = (char*)malloc(st.st_size);
    if (!buf) {fclose(fin); return 0;}
    fread(buf, st.st_size, 1, fin);
    fclose(fin);

No need to say you should use "new" in C++ not malloc()

jurhas
  • 613
  • 1
  • 4
  • 12