0

I have the following loop:

for (int i = 1; i <= epochs; ++i) {
    for (std::vector<std::filesystem::path>::iterator it = batchFiles.begin(); it != batchFiles.end(); ++it) {
        struct fann_train_data *data = fann_read_train_from_file(it->string().c_str());
        fann_shuffle_train_data(data);
        float error = fann_train_epoch(ann, data);
    }
}

ann is the network.
batchFiles is a std::vector<std::filesystem::path>.

This code iterates through all the training data files in a folder and uses it to train the ANN each time, as many times as determined by the epochs variable.

The following line causes a memory leak:

struct fann_train_data *data = fann_read_train_from_file(it->string().c_str());

The problem is that I must constantly switch between the training files, as I don't have enough memory to load them all at once, otherwise I would have loaded the training data just once.

Why does this happen? How can I resolve this?

daedsidog
  • 1,732
  • 2
  • 17
  • 36
  • 4
    I am not familiar with ANN, but the [documentation here](http://leenissen.dk/fann/fann_1_2_0/r670.html) suggests that `fann_destroy_train` should be called on each iteration. – PaulMcKenzie Mar 04 '19 at 02:46
  • Thank you. I wasn't aware that the data has its own destroying function, as the `fann_destroy` one would not accept training data. – daedsidog Mar 04 '19 at 12:45

2 Answers2

2

In C++, memory is automatically freed when the object managing it goes out of scope. (Assuming the class was properly written.) That's called RAII.

But FANN presents a C API, not a C++ API. In C, you need to manually free memory when you're done with it. By extension, when a C library creates an object for you, it typically needs you to tell it when you're done with the object. The library doesn't have a good way to figure out on its own when the object's resources should be freed.

The convention is that whenever a C API gives you a function like struct foo* create_foo(), you should be looking for a corresponding function like void free_foo(struct foo* f). It's symmetrical.

In your case, as originally noted by PaulMcKenzie, you need void fann_destroy_train_data(struct fann_train_data * train_data). From the documentation, emphasis mine:

Destructs the training data and properly deallocates all of the associated data. Be sure to call this function after finished using the training data.

Maxpm
  • 24,113
  • 33
  • 111
  • 170
2

Since the fann_destroy_train_data is required to be called, you can utilize C++ and RAII using the following wrapper:

struct fann_wrapper
{
   fann_train_data *td;
   fann_wrapper(fann_train_data* p) : td(p) {}
   ~fann_wrapper() { fann_destroy_train_data(td); }
};
//...
for (int i = 1; i <= epochs; ++i) {
    for (std::vector<std::filesystem::path>::iterator it = batchFiles.begin(); it != batchFiles.end(); ++it) {
        struct fann_train_data *data = fann_read_train_from_file(it->string().c_str());

        // the next line ensures that fann_destroy_train_data is called
        fann_wrapper fw(data);

        fann_shuffle_train_data(data);
        float error = fann_train_epoch(ann, data);
    }  // when this curly brace is encountered, the fann_destroy_train_data is always called
}  

The fann_wrapper simply holds the fain_train_data pointer, and on destruction of fann_wrapper, the fann_train_data is destroyed.

The reason why this is much safer than the raw C method is in the case where a possible exception could be thrown (for whatever reason). If an exception is thrown, then the fann_train_data will always be destroyed when using the fann_wrapper. That guarantee cannot be made with the C method, since an exception would totally skip over any line that had the fann_destroy_train_data.

Example:

for (int i = 1; i <= epochs; ++i) {
    for (std::vector<std::filesystem::path>::iterator it = batchFiles.begin(); it != batchFiles.end(); ++it) {
        struct fann_train_data *data = fann_read_train_from_file(it->string().c_str());
        fann_shuffle_train_data(data);
        float error = fann_train_epoch(ann, data);

        fann_destroy_train_data(data); // this line is not executed if an exception is thrown above, thus a memory leak
    }
}  

This is why RAII is an important concept in C++. Resources will get cleaned up automatically, regardless of the reason why the executable block of code is exited (exception thrown, a return is done, etc.).

PaulMcKenzie
  • 34,698
  • 4
  • 24
  • 45