How to get the memory used by a multidimensional vector

Question

I am currently writing some code to create a neural network, and i am trying to make it as optimised as possible. I want to be able to get the amount of memory consumed by a object of type Network, since memory usage is very important in order to avoid cache misses. I tried using sizeof(), however this does not work, since, i assume, that vectors store the values on the heap, so the sizeof() function will just tell me the size of the pointers. Here is my code so far.

#include <iostream>
#include <vector>
#include <random>
#include <chrono>

class Timer
{
private:
    std::chrono::time_point<std::chrono::high_resolution_clock> start_time;
public:
    Timer(bool auto_start=true)
    {
        if (auto_start)
        {
            start();
        }
    }
    void start()
    {
        start_time = std::chrono::high_resolution_clock::now();
    }
    float get_duration()
    {
        std::chrono::duration<float> duration = std::chrono::high_resolution_clock::now() - start_time;
        return duration.count();
    }
};

class Network
{
public:
    std::vector<std::vector<std::vector<float>>> weights;
    std::vector<std::vector<std::vector<float>>> deriv_weights;
    std::vector<std::vector<float>> biases;
    std::vector<std::vector<float>> deriv_biases;
    std::vector<std::vector<float>> activations;
    std::vector<std::vector<float>> deriv_activations;
};

Network create_network(std::vector<int> layers)
{
    Network network;
    network.weights.reserve(layers.size() - 1);
    int nodes_in_prev_layer = layers[0];
    for (unsigned int i = 0; i < layers.size() - 1; ++i)
    {
        int nodes_in_layer = layers[i + 1];
        network.weights.push_back(std::vector<std::vector<float>>());
        network.weights[i].reserve(nodes_in_layer);
        for (int j = 0; j < nodes_in_layer; ++j)
        {
            network.weights[i].push_back(std::vector<float>());
            network.weights[i][j].reserve(nodes_in_prev_layer);
            for (int k = 0; k < nodes_in_prev_layer; ++k)
            {
                float input_weight = float(std::rand()) / RAND_MAX;
                network.weights[i][j].push_back(input_weight);
            }
        }
        nodes_in_prev_layer = nodes_in_layer;
    }
    return network;
}

int main() 
{
    Timer timer;
    Network network = create_network({784, 800, 16, 10});
    std::cout << timer.get_duration() << std::endl;
    std::cout << sizeof(network) << std::endl;
    std::cin.get();
}

If you're _really_ lazy, just add a "numFloats++" to your innermost loop; memory usage is then roughly `numFloats*sizeof(float)`. — Botje, Apr 14 '20 at 12:05
Ahh thats a nice simple solution, thanks. I'm not worried about the performance of the performance measuring code, just the release code, so this will work just fine. — finlay morrison, Apr 14 '20 at 12:09
Are the dimensions fixed? That means, have all inner vectors at the same level also same length? — Daniel Langr, Apr 14 '20 at 12:11
And, if you care about memory efficiency, wouldn't by much more efficient to place all floats into a 1-D vector? With some additional "indexing" structure. — Daniel Langr, Apr 14 '20 at 12:20
Its not just memory efficiency i care about, i care about performance, and the memory efficiency will help with that. I experimented around with something like that but it seems that it would perform worse than just having a multidimensional vector — finlay morrison, Apr 14 '20 at 12:23
With a vector of vectors, inner vectors are "randomly" placed in memory with some overhead, which is cache-unfriendly (and SIMD-unfriendly as well). Having all the elements in a 1D vector will provide a better cache utilization. What you need is basically an efficient 2D or even 3D _jagged array_ implementation, which is not an easy task I guess. — Daniel Langr, Apr 14 '20 at 12:27
@finlaymorrison: The key insight is that you use each weight of your matrix once per input vector. Both CPU's and memory performs best when doing sequential loads. Therefore, the overriding concern is that all weights of a layer must be stored sequentially in the order used. Anything else is a _lot_ slower. Each weight takes just a single FMA, so any overhead whatsoever can halve your speed. — MSalters, Apr 14 '20 at 12:32

score 1 · Answer 1 · answered Apr 14 '20 at 12:29

I've recently updated our production neural network code to AVX-512; it's definitely real-world production code. A key part of our optimalisations is that each matrix is not a std::vector, but a 1D AVX-aligned array. Even without AVX alignment, we see a huge benefit in moving to a one-dimensional array backing each matrix. This means the memory access will be fully sequential, which is much faster. The size will then be (rows*cols)*sizeof(float).

We store the bias as the first full row. Commonly that's implemented by prefixing the input with a 1.0 element, but for our AVX code we use the bias as the starting values for the FMA (Fused Multiply-Add) operations. I.e. in pseudo-code result=bias; for(input:inputs) result+=(input*weight). This keeps the input also AVX-aligned.

Since each matrix is used in turn, you can safely have a std::vector<Matrix> layers.

score -1 · Answer 2 · answered Apr 14 '20 at 12:14

As quote from https://stackoverflow.com/a/17254518/7588455:
Vector stores its elements in an internally-allocated memory array. You can do this:

sizeof(std::vector<int>) + (sizeof(int) * MyVector.size())

This will give you the size of the vector structure itself plus the size of all the ints in it, but it may not include whatever small overhead your memory allocator may impose. I'm not sure there's a platform-independent way to include that.

In your case only the actually internally-allocated memory array matters since you're just accessing these. Also be aware of how you're accessing the memory.
In order to write cache friendly code I highly recommend to read thru this SO post: https://stackoverflow.com/a/16699282/7588455

How to get the memory used by a multidimensional vector

2 Answers2