-1

So I created this program to write vectors of integer into a binary file, then retrieve the data again back.

//vec.cpp
#include <iostream>
#include <vector>
#include <fstream>

template<typename T>
void writeKey(std::string filename, std::vector<T> arr)
{

    arr.insert(arr.begin(),(T)arr.size());
    
    std::ofstream write_bin(filename, std::ios::out | std::ios::binary);
    if(!write_bin)
    {
        std::cout << "ERROR: writing vector to file!\n";
        exit(1);
    }
    
    for(size_t i = 0; i < arr.size(); i++)
    {
        write_bin.write((char *) &arr[i], sizeof(int));
    }

    write_bin.close();
    if(!write_bin.good())
    {
        std::cout<<"ERROR: writing time error\n";
        exit(1);
    }
}

template<typename T>
std::vector<T> readKey(std::string filename)
{
    std::ifstream read_bin(filename, std::ios::out | std::ios::binary);
    if(!read_bin)
    {
        std::cout<<"ERROR: reading binary file!\n";
        exit(1);
    }

    size_t limit;
    read_bin.read((char*)&limit, sizeof(T));

    std::cout<<"limit : "<<limit<<'\n';

    std::vector<T> arr(limit,0);

    for(size_t i=0; i<limit; ++i)
    {
        read_bin.read((char*)&arr[i], sizeof(T));
    }

    read_bin.close();

    return arr;
}

int main()
{
    
    std::vector<int> mykey = {5,10,15,20};
    writeKey("test.key", mykey);

    std::vector<int> mykeys = readKey<int>("test.key");

    for(auto e: mykeys)
        std::cout<<e<<' ';
    std::cout<<'\n';

    return 0;
}

So you see what I did here is I compile the program that only call the writeKey() function then run it... The program it runs perfectly

int main()
{
    
    std::vector<int> mykey = {5,10,15,20};
    writeKey("test.key", mykey);

    return 0;
}

Then I compiled it again but this time I only call the readkey() function, then I run it, and again it runs as intended

int main()
{
    std::vector<int> mykeys = readKey<int>("test.key");

    for(auto e: mykeys)
        std::cout<<e<<' ';
    std::cout<<'\n';

    return 0;
}

The problem arises the moment I call these two function inside the main() function, then compile and run it here the limit variable in readkey function is having some kind of overflowed value instead of the value that I inserted at the begining of the vector in the writekey function

int main()
{
    
    std::vector<int> mykey = {5,10,15,20};
    writeKey("test.key", mykey);

    std::vector<int> mykeys = readKey<int>("test.key");

    for(auto e: mykeys)
        std::cout<<e<<' ';
    std::cout<<'\n';

    return 0;
}

What is happening here? and how can I fix this?

this is my compilation flags : g++ -o vec.o vec.cpp -Wall -Wextra -fsanitize=address

  • it seems that the size_t limit; is not right the moment I call these two function together in the main() why is that? –  Jun 26 '21 at 07:50
  • 1
    `write_bin.write((char *) &arr[i], sizeof(int));` Why use `sizeof(int)` instead of `sizeof(T)`? Also why pass the vector by copy instead of by const ref? Also it's better not to use c-style casts; use `static_cast`/`reinterpret_cast` instead. – fabian Jun 26 '21 at 08:01
  • Also there's the "small issue" that you're you're writing an (signed) `int` to the file and reading this data to `size_t` which is likely to use have a different number of bytes in addition to being unsigned. Lets consider the case where `int` uses 32 bit and `size_t` uses 64 bit: Since the `size_t limit` is not initilized, it may contain arbitrary data, but you only overwrite the first 32 bit of the data when reading from the file; furthermore even if you initialize `limit` with 0 on a big endian machine you've just multiplied the value by 2^32... – fabian Jun 26 '21 at 08:08
  • @fabian As you can see I insert an element to the begining of the vector thats why I passed a copy of it because I don't want to change the original vector, changing the (int) to (T) also did not help, also static_cast, and reinterpret_cast did not solve it, I have also tried using unsigned long as vector type but it still the same error when I call the two function together in the main() function –  Jun 26 '21 at 08:32
  • "it throws an error" What error? – n. m. could be an AI Jun 26 '21 at 08:39
  • an -fsanitize=address error, whats happening is the variable limit is not having the right value that I inserted in the begining of the vector instead I has some kind of junk overflowed value, this only happens though when I call the two function at the main() function, but not if I compile them separately then run the program that have each function its in the example above –  Jun 26 '21 at 08:43

1 Answers1

1

As already mentioned in the comments by @fabian you are treating your arr.size() as an int (T) when it is actually from type std::size_t. The problem is that std::size_t is 8bytes long (if you are running your program on a 64bit machine) and int only 4bytes.

First, you are converting your std::size_t to an int in the line arr.insert(arr.begin(),(T)arr.size()); so it "fits" into your vector. You are effectively get rid of half the bytes while doing this. When you then write your vector to the file you wrote 4bytes instead of the needed 8. Now if you read back your values with read_bin.read((char*)&limit, sizeof(T)); you are reading, well, 4bytes into limit, which now has the type std::size_t, i.e. consists of 8bytes again so only the first 4 bytes of your limit are changed the rest is left untouched. Running this code in VS in debug-mode you can examine what happens (bytes that changed are marked red): enter image description here

Only the first 4 bytes change and the rest is left untouched. Because I ran it in debug-mode the rest is set to cc so the value of the limit becomes 14757395255531667460. This value is too large for the vector and so std::length_error is thrown. If you would run this in release mode you would get undefined behavior because you cannot know what bytes were on the location before the limit variable. Maybe everything runs ok (because all the bytes were 0) but likely you will get a wrong value for the limit.

To fix this just treat arr.size() as what it is, a std::size_t and don't store it in the first position of your vector (which can only hold T), instead just write it to your file before you write your values and replace sizeof(int) with sizeof(T) (this only works in this example), so your write function becomes:

template<typename T>
void writeKey(const std::string& filename, const std::vector<T>& arr) {

    std::ofstream write_bin(filename, std::ios::out | std::ios::binary);
    if (!write_bin) {
        std::cout << "ERROR: writing vector to file!\n";
        exit(1);
    }

    std::size_t limit = arr.size();
    write_bin.write(reinterpret_cast<char*>(&limit), sizeof(std::size_t));

    if (limit != 0)
        write_bin.write(reinterpret_cast<const char*>(&arr[0]), sizeof(T) * limit);

    write_bin.close();
    if (!write_bin.good()) {
        std::cout << "ERROR: writing time error\n";
        exit(1);
    }
}

Now, you only have to read your limit back before you read all the other values so your read function becomes:

template<typename T>
std::vector<T> readKey(const std::string& filename) {

    std::ifstream read_bin(filename, std::ios::out | std::ios::binary);
    if (!read_bin) {
        std::cout << "ERROR: reading binary file!\n";
        exit(1);
    }

    std::size_t limit;
    read_bin.read(reinterpret_cast<char*>(&limit), sizeof(std::size_t));

    std::vector<T> arr(limit, 0);
    if (limit != 0)
        read_bin.read(reinterpret_cast<char*>(&arr[0]), sizeof(T) * limit);

    read_bin.close();

    std::cout << "limit : " << limit << '\n';

    return arr;
}

I changed some other things like passing const references to the functions and removing the for-loops with a single write/read call. Also, see When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used?.

Sebphil
  • 488
  • 5
  • 12