0

I am currently learning the C ++ language and need to read a file containing more than 5000 double type numbers. Since push_back will make a copy while allocating new data, I was trying to figure out a way to decrease computational work. Note that the file may contain a random number of double types, so allocating memory by specifying a large enough vector is not the solution looking for.

My idea would be to quickly read the whole file and get and approximation size of the array. In Save & read double vector from file C++? found an interesting idea that can be found in the code below.

Basically, the vector containing the file data is inserted in a structure type named PathStruct. Bear in mind that the PathStruct contains more that this vector, but for the sake of simplicity I deleted all the rest. The function receives a reference of the PathStruct pointer and read the file.

struct PathStruct
{
    std::vector<double> trivial_vector;
};

bool getFileContent(PathStruct *&path)
{
    std::ifstream filename("simplePath.txt", std::ios::in | std::ifstream::binary);
    if (!filename.good())
        return false;
    std::vector<char> buffer{};
    std::istreambuf_iterator<char> iter(filename);
    std::istreambuf_iterator<char> end{};
    std::copy(iter, end, std::back_inserter(buffer));
    path->trivial_vector.reserve(buffer.size() / sizeof(double));
    memcpy(&path->trivial_vector[0], &buffer[0], buffer.size());
    return true;
};

int main(int argc, char **argv)
{

    PathStruct *path = new PathStruct;

    const int result = getFileContent(path);

    return 0;
}

When I run the code, the compiler give the following error:

corrupted size vs. prev_size, Aborted (core dumped).

I believe my problem in the incorrect use of pointer. Is definitely not my strongest point, but I cannot find the problem. I hope someone could help out this poor soul.

kmaqueta
  • 3
  • 1
  • Do you have a minimum amount of data you know the file will have? – NathanOliver Aug 09 '19 at 12:33
  • There is no point (just looking at your code) to allocate two vectors, it seems to be just a waste of resources. Moreover, I think a critical point is how your doubles are coded in the file. In binary form? ASCII/textual representation (with a fixed length?) etc – BiagioF Aug 09 '19 at 12:35
  • 5000 `double` take up a whopping 40kB of memory, I'd just reserve a resonable maximum size and use `push_back`, keep it simple. If you then find that is slow that is the time to start optimising. – Alan Birtles Aug 09 '19 at 12:37
  • 1
    The issue with your code is calling `reserve` rather than `resize` which means the vector is still empty so you aren't allowed to write to it – Alan Birtles Aug 09 '19 at 12:38
  • The ".txt" suffix is usually used with text. – molbdnilo Aug 09 '19 at 12:38
  • check [this answer](https://stackoverflow.com/questions/12983069/dynamically-allocating-memory-to-struct-when-reading-from-file-in-c) it should help you – Amirouche Zeggagh Aug 09 '19 at 12:38
  • @molbdnilo but he uses the `std::ifstream::binary` flag to open the file. – Timo Aug 09 '19 at 12:39
  • @Timo I don't see your point. If the file contains binary data, the suffix is surprising. If it contains text, the reading is completely wrong. – molbdnilo Aug 09 '19 at 12:42
  • @molbdnilo I think I missunderstood your comment. I thought you were pointing out that OP uses ASCII because he uses a txt file (as a reply to Biagio Festa's comment). My bad. – Timo Aug 09 '19 at 12:47
  • @BiagioFesta, an example of the file simplePath.txt could be something such as: `0.166908 0.228805 -0.038947`. And yes, numbers have fixed length. – kmaqueta Aug 09 '19 at 13:41
  • @kmaqueta so why are you reading the file in binary format? – BiagioF Aug 09 '19 at 13:54
  • @BiagioFesta I followed an example proposed in [Save & read double vector from file C++](https://stackoverflow.com/questions/46663046/save-read-double-vector-from-file-c). It was the closest example that I could find for a similar problem. They don't mention the type of file that was used, so I assume that it would also work with txt file. My mistake then. – kmaqueta Aug 09 '19 at 14:08

1 Answers1

1

If your file contains only consecutive double values, you can check the file size and divide it by double size. To determine the file size you can use std::filesystem::file_size but this function is available from C++ 17. If you cannot use C++ 17, you can find other methods for determining file size here

auto fileName = "file.bin";
auto fileSize = std::filesystem::file_size(fileName);
std::ifstream inputFile("file.bin", std::ios::binary);
std::vector<double> values;
values.reserve(fileSize / sizeof(double));
double val;
while(inputFile.read(reinterpret_cast<char*>(&val), sizeof(double)))
{
    values.push_back(val);
}

or using pointers:

auto numberOfValues = fileSize / sizeof(double);
std::vector<double> values(numberOfValues);
// Notice that I pass numberOfValues * sizeof(double) as a number of bytes to read instead of fileSize
// because the fileSize may be not divisable by sizeof(double)
inputFile.read(reinterpret_cast<char*>(values.data()), numberOfValues * sizeof(double));

Alternative

If you can modify the file structure, you can add a number of double values at the beginning of the file and read this number before reading double values. This way you will always know the number of values to read, without checking file size.

Alternative 2

You can also change a container from std::vector to std::deque. This container is similar to std::vector, but instead of keeping a single buffer for data, it has may smaller array. If you are inserting data and the array is full, the additional array will be allocated and linked without copying previous data. This has however a small price, data access requires two pointer dereferences instead of one.