0

I'm writing a C++ program to perform calculations on a huge graph and therefore has to be as fast as possible. I have a 100MB textfile of unweighted edges and am reading them into a 2D vector of integers (first index = nodeID, then a sorted list of nodeIDs of nodes which have edges to that node). Also, during the program, the edges are looked up exactly in the order in which they're stored in the list. So my expectation was that, apart from a few bigger gaps, it'd always be nicely preloaded to the cache. However, according to my profiler, iterating through the edges of a player is an issue. Therefore I suspect, that the 2D vector isn't placed in memory compactly.

How can I ensure that my 2D vector is as compact as possible and the subvectors in the order in which they should be? (I thought for example about making a "2D array" from the 2D vector, first an array of pointers, then the lists.)

BTW: In case it wasn't clear: The nodes can have different numbers of edges, so a normal 2D array is no option. There are a couple ones with lots of edges, but most have very few.

EDIT:

I've solved the problem and my program is now more than twice as fast: There was a first solution and then a slight improvement:

  1. I put the lists of neighbour ids into a 1D integer array and had another array to know where a certain id's neighbour lists start

  2. I got a noticeable speedup by replacing the pointer array (a pointer needs 64 bit) with a 32 bit integer array containing indices instead

Tobs40
  • 65
  • 7
  • 1
    To make a 2d vector or array compact, declare it as a one dimensional array of (width * height) elements. You can always calculate the index into the 1d array based on row and column coordinates. More compaction requires compression algorithms. – Thomas Matthews Jul 15 '20 at 16:43
  • Another idea at optimization is to use an array of structs or classes rather than a 2d array. The structs ensure that the items are next to each other in the cache; otherwise the processor may have to reload the cache to fetch a value in another column. – Thomas Matthews Jul 15 '20 at 16:47
  • 1
    You might find Mike Acton's talk about [data-oriented design](https://www.youtube.com/watch?v=rX0ItVEVjHc) useful, especially around 36 minute mark. It doesn't directly address your problem but it may give you some ideas on how to ensure your data is cache friendly. The talk definitely changed the way I think about passing and using data in my code. – NotAProgrammer Jul 15 '20 at 16:59
  • @ThomasMatthews I used that pseudo-2d-array trick for my chess engine, but unfortunately I can't calculate the index because I'd have to know how many entries there were previously. Or do you mean storing the sublists next to each other in an array and having another array pointing at where one player starts? That'd be pretty awesome I guess! – Tobs40 Jul 16 '20 at 17:47
  • @ThomasMatthews I don't get the struct idea though. So an array of structs and within each struct the sublist?? – Tobs40 Jul 16 '20 at 17:48
  • @NotAProgrammer Thanks I'll have a look at around minute 36 – Tobs40 Jul 16 '20 at 17:48
  • If you want variables close to each other, in the processor's data cache, put them into a struct. So for example, using parallel arrays `int A[2048], B[2048], C[2048];`, The element `B[0]` is at least 2048 locations from `A[0]` and `C[0]` even further away. The processor may have to reload it's data cache to access `B[0]` and `C[0]`. However, when using an array of struct, the values `A[0],B[0]` and `C[0]` will next to each other. – Thomas Matthews Jul 16 '20 at 17:54
  • @ThomasMatthews I got that, but I don't see how that's gonna help me with putting a lot of sublists next to each other – Tobs40 Jul 17 '20 at 18:06

2 Answers2

1

What data structure are you using for the 2d vector? If you use std::vector then the memory will be contiguous.

Next, if pointers are stored then only the address will take advantage of the vectors spacial locality. Are you accessing the object pointed to when iterating the edges and if so this could be a bottleneck. To get around this perhaps you can setup your objects so they are also in contiguous memory and take advantage of spacial locality.

Finally the way in which you access the members of a vector affects the caching. Make sure you are accessing in an order advantageous to the container used (eg change column index first when iterating).

Here's some helpful links:

Cache Blocking Techniques

SO on cache friendly code

BoldAsLove
  • 660
  • 4
  • 13
  • I'm already accessing the main vector from low to high index. The trouble is having all those subvectors it's pointing to lying around somewhere in memory. I guess I'll go with a pointer array pointing at another 1d array which contains all the sublists (I still need constant time access) – Tobs40 Jul 16 '20 at 17:53
0

I have written a few of these type structures by having a 2D view onto a 1D vector and there are lots of different ways to do it. I have never made one that allows the internal arrays to vary in length before so this may contain bugs but should illustrate the general approach:

#include <cassert>
#include <iostream>
#include <vector>

template<typename T>
class array_of_arrays
{
public:
    array_of_arrays() {}

    template<typename Iter>
    void push_back(Iter beg, Iter end)
    {
        m_idx.push_back(m_vec.size());
        m_vec.insert(std::end(m_vec), beg, end);
    }

    T* operator[](std::size_t row) { assert(row < rows()); return &m_vec[m_idx[row]]; }
    T const* operator[](std::size_t row) const { assert(row < rows()); return &m_vec[m_idx[row]]; }

    std::size_t rows() const { return m_idx.size(); }

    std::size_t cols(std::size_t row) const
    {
        assert(row <= m_idx.size());
        auto b = m_idx[row];
        auto e = row + 1 >= m_idx.size() ? m_vec.size() : m_idx[row + 1];
        return std::size_t(e - b);
    }

private:
    std::vector<T> m_vec;
    std::vector<std::size_t> m_idx;
};

int main()
{
    array_of_arrays<int> aoa;

    auto data = {2, 4, 3, 5, 7, 2, 8, 1, 3, 6, 1};

    aoa.push_back(std::begin(data), std::begin(data) + 3);
    aoa.push_back(std::begin(data) + 3, std::begin(data) + 8);

    for(auto row = 0UL; row < aoa.rows(); ++row)
    {
        for(auto col = 0UL; col < aoa.cols(row); ++col)
        {
            std::cout << aoa[row][col] << ' ';
        }
        std::cout << '\n';
    }

}

Output:

2 4 3 
5 7 2 8 1 
Galik
  • 47,303
  • 4
  • 80
  • 117