How can this combination algorithm be optimized?

Question

I am writing a molecular dynamics program that needs to take the atoms in a molecule and find the possible ways they can bond. To do this, I have a vector of Atom objects and I generate combination pairs using the following algorithm:

    void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
        std::string bitmask(K, 1);
        bitmask.resize(N, 0);

        do {
            /* This loop takes forever with larger N values (approx. 3000) */
            std::vector<int> indexRow;

            for (int i = 0; i < N; i++)
            {
                if (bitmask[i]) indexRow.push_back(i);
            }

            indices.push_back(indexRow);
        } while (std::prev_permutation(bitmask.begin(), bitmask.end()));
    }

It is a simple N choose K algorithm (i.e. the indices returned could contain (1, 2) but not (2, 1)) where in my case N is the number of atoms in the molecule and K is 2.

I then call the algorithm like this:

void CalculateBondGraph(const std::vector<Atom *> &atoms, std::map<int, 
    std::map<int, double>> &bondGraph, ForceField *forceField) {
    int natoms = atoms.size();

    std::vector<std::vector<int>> indices;

    utils::CombinationKN(indices, 2, natoms);

    for (auto &v : indices) {
        int i = v[0];
        int j = v[1];

        /*... Check if atoms i and j are bonded based on their coordinates.*/
    }
}

The issue with this algorithm is that it takes forever to complete for large molecules that have 3000+ atoms. I have thought about parallelizing it (specifically with OpenMP), but even then, the work would have to be split among a few threads and it would still take a lot of time to complete. I need a way to optimize this algorithm so it doesn't take so long to compute combinations. Any help is appreciated.

Thank you, Vikas

Profile. Use a profiler to determine the code areas that take up the most time. Once the code is found, have compiler generate assembly listing. See if you can optimize the code better. Otherwise look at changing the design and data structures. Search the internet for "Data Driven Design" and "optimization for instruction cache" and "optimization for data cache". You may want to set the compiler's optimizations for highest and make a release build, then profile. — Thomas Matthews, Jan 04 '19 at 00:14
`std::vector>` can be surprisingly slow due to the lack of contiguous data. Often a [single `vector` masquerading as 2D](https://stackoverflow.com/a/2076668/4581301) is a better option. It looks like you have something that works that you want to make work better. Ask your rubber duck if [Code Review](https://codereview.stackexchange.com/help/asking) is right for you. [And read their how to ask pages to make sure](https://codereview.stackexchange.com/help/asking). — user4581301, Jan 04 '19 at 00:16
One major consumption of time is branching. Use algebra or Boolean Algebra to reduce instructions. If your processor supports conditional compilation, you may want to redesign your algorithm to take advantage of that. Although processors may be able to fit small loops in their instruction cache, evaluation of a branch takes more time than advancing to the next instruction. — Thomas Matthews, Jan 04 '19 at 00:20
As implemented, `CombinationKN()` resizes various containers multiple times, which (potentially) results in memory allocation and deallocation every loop iteration - so quality of implementation of the containers and their allocators is a key driver of performance. Before worrying about how to parallelise the loops, it would be worth optimising the memory reallocation (e.g. reserving a maximum possible size). — Peter, Jan 04 '19 at 00:35
Thank you for the tips. I used the matrix instead of the vector of vectors along with the new code to speed up the function. — Vikas M, Jan 07 '19 at 00:24

Matt Timmermans · Accepted Answer · 2019-01-04T04:59:30.433

Your CombinationKN function is way more expensive than it needs to be, if K is much smaller than N -- and if N is large then of course K is much smaller than N or you will run out of memory very quickly.

Notice that every valid index_row is a strictly monotonically increasing sequence of K integers less than N and vice-versa. It's easy enough to generate these directly:

void CombinationKN(std::vector<std::vector<int>> &indices, int K, int N) {
    std::vector<int> index_row;
    // lexographically first valid row
    for (int i=0; i<K; ++i) {
        index_row.push_back(i);
    }

    for(;;) {
        // output current row
        indeces.push_back(index_row);

        // increment index_row the the lexically next valid sequence
        // find the right-most index we can increment
        // This loop does O(1) amortized iterations if K is not large.  O(K) worst case
        int inc_index=K-1;
        int index_limit=N-1;
        while(inc_index >= 0 && index_row[inc_index] >= index_limit) {
          --inc_index;
          --index_limit;
        }
        if (inc_index < 0) {
            break; //all done
        }
        // generate the lexically first valid row with matching prefix and
        // larger value at inc_index
        int val = index_row[inc_index]+1;
        for (;inc_index<K; ++inc_index, ++val) {
            index_row[inc_index] = val;
        }
    }
}

Also, if the only thing you're doing with these combinations is iterating through them, then there's no reason to waste the (possible very large amount of) memory required to store the whole list of them. The above function contains a procedure for generating the next one from the previous one when you need it.

Thank you for the answer. I use these indices all over the place in my code, so I put it a function to be easier. — Vikas M, Jan 07 '19 at 00:20

How can this combination algorithm be optimized?

1 Answers1