8

I have a huge contiguous array x that I fread from a file.

How do I drop this chunk into a std::vector<>? In other words, I prefer to have the result to be in std::vector<> rather than the array, but I want the resultant C++ code to be as efficient as this plain C-version which drops the chunk right into the array.

From searching around, I think I may have to use placement-new in some form, but I'm uncertain about the sequence of calls and ownership issues. Also, do I need to worry about alignment issues?

I am testing for with T = unsigned, but I expect a reasonable solution to work for any POD struct.

using T = unsigned;
FILE* fp = fopen( outfile.c_str(), "r" );
T* x = new T[big_n];
fread( x, sizeof(T), big_n, fp );

// how do I get x into std::vector<T> v
// without calling a gazillion push_backs() or copies ?!?

delete[] x;
fclose( fp );
kfmfe04
  • 14,936
  • 14
  • 74
  • 140

2 Answers2

11

You use the std::vector constructor which sets the size of the vector, and use std::vector::data to get a pointer to allocated memory.

Keeping with your use of fread:

std::vector<T> x(big_n);
fread(x.data(), sizeof(T), big_n, fp);

As noted by others, using fread if the type T is not a POD type will most likely not work. You can then use C++ streams and std::istreambuf_iterator to read the file into the vector. However this have the drawback that it loops over all items in the file, and if big_n is as big as it sounds then this might be a performance problem.


However, if the file truly is big, I rather recommend using memory mapping to read the file.

Community
  • 1
  • 1
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 2
    That's a simple solution, but not quite "as efficient as this plain C-version" since it initialises the vector with zero values. – Mike Seymour Jan 17 '13 at 13:41
  • If T will ever be a non-POD, you'd need to call the destructors manualy before you overwrite `.data()`, right? – jrok Jan 17 '13 at 13:42
  • 5
    @jrok If T is anything but a character type, you can't reliably read it with `fread`. And if T isn't a POD, it's almost certain that you can't read it. – James Kanze Jan 17 '13 at 13:44
  • I actually tried `mmap` before this. In fact, because I want really big chunk of the file at once, `fread` is actually faster. But I think for lots of random/small reads, `mmap` would be preferable. – kfmfe04 Jan 17 '13 at 13:46
  • @kfmfe04 No, mmap shoould be faster for lots of data since it allocates whole vector at once without reallocations. Maybe you did something wrong – BЈовић Jan 17 '13 at 14:23
  • 1
    The only thing i dislike about this solution is to break the `std::vector` encapsulation by accessing _private_ memory managed by the vector via the `data` method. – PaperBirdMaster Jan 17 '13 at 14:46
  • @BЈовић I retried `mmap` with Joachim's solution combined with `memcpy` and it's running much faster than before - you're right: something was off with the earlier version. – kfmfe04 Jan 17 '13 at 14:53
  • @kfmfe04 Good. Now try with some casting and without memcpy – BЈовић Jan 17 '13 at 14:58
  • @BЈовић unfortunately, I need the `memcpy` so I can do a `std::sort` as part of an external sort - so I need to flush the results to disk afterwards. – kfmfe04 Jan 17 '13 at 15:07
  • 2
    @kfmfe04 `std::sort` can sort normal raw arrays as well, which is what the memory pointer returned by `mmap` can bee seen as. – Some programmer dude Jan 17 '13 at 15:10
  • 1
    @PaperBirdMaster, `data()` is a _public_ member, added in C++11 specifically to allow you to do this. You could already do it with `&x[0]` in C++03 anyway. The memory layout of a `std::vector` is part of its interface. – Jonathan Wakely Jan 17 '13 at 22:48
  • @JoachimPileborg @BЈовић I appreciate your suggestions: I ended up using `mmap` in conjunction with `std::transform` to shape the data the way I needed it. ty. – kfmfe04 Jan 18 '13 at 02:30
  • @JonathanWakely `vector::data()` maybe is _for_ this functionality, but, I feel troubled using it. This method returns a pointer to the internal memory presumably managed by the class, IMO is insecure; after calling `data` you can read/write into the returned pointer, pass the pointer to functions, store the pointer into another object and sometimes you'll not sure about the lifetime of the memory pointed by this value. Yes, doing this stuff is a bad practice but `vector::data()` is a _public_ member and allow you to do this. – PaperBirdMaster Jan 18 '13 at 09:13
  • @PaperBirdMaster, the same is true if you have a pointer to an array, or to any object. If you store that pointer somewhere it might be unsafe because the object went out of scope. You need to be careful, yes, but then this is C++ :) With great power comes great responsibility. – Jonathan Wakely Jan 18 '13 at 09:25
  • @JonathanWakely The fact is that you're right, but I feel troubled "_breaking encapsulations_" anyways :P – PaperBirdMaster Jan 18 '13 at 09:37
0

This will read the file into a vector using

#include <vector>
#include <fstream>
#include<iterator>
// ...

std::ifstream testFile("testfile", std::ios::binary);
std::vector<unsigned char> fileContents((std::istreambuf_iterator<unsigned char>(testFile)),
                           std::istreambuf_iterator<unsigned char>());

This answer comes from a previous answer: https://stackoverflow.com/a/4761779/942596

Community
  • 1
  • 1
andre
  • 7,018
  • 4
  • 43
  • 75
  • +1 for an interesting alternative - I suppose this should also work for any `std::vector`? I'll try it out. – kfmfe04 Jan 17 '13 at 13:56
  • @kfmfe04 That's a good question I have never tested that. Let me know if you can. – andre Jan 17 '13 at 13:58
  • This code doesn't work, you can't use `istreambuf_iterator` to read from `basic_istream` (the character types don't match). Also, in C++11 the second iterator can be written simply `{}` – Jonathan Wakely Jan 17 '13 at 22:53
  • `istreambuf_iterator + 5` is not working. So it is impossible to read chunk of file using this method – Arkady Jan 28 '18 at 23:00