0

I am working on an applied set-covering problem. In this research I want to generate all possible combinations. I.e. n = 5 and k = 3 yields

0 0 1
0 0 2 
0 0 3
etc..

This is no problem for smaller sized problems, but when n and k increases, say n = 250 and k = 6, the number of combinations are 3.1920e+11. All combinations can not be stored in one matrix, therefore I need an algorithm which can compute x combinations and then the x next combinations given the end point of the first matrix. Does anyone know any algorithms that does this quickly in either C/C++/CUDA or Matlab?

Thanks.

NuPagadi
  • 1,410
  • 1
  • 11
  • 31
proczell
  • 49
  • 2
  • 8
  • 3
    I think what you're planning to do with all these combinations strongly matters. Or you want just to generate them and store into file? – NuPagadi May 09 '18 at 20:38
  • 1
    The main purpose for this question is to store them in either a file or an array/matrix. I am using the combinations for some calculations later on, but I have all the code needed for this. – proczell May 09 '18 at 20:40
  • Why `0 0 0` is missing? – Slava May 09 '18 at 20:42
  • I don't want permutations or repetition, therefore no combination should have all numbers equal – proczell May 09 '18 at 20:44
  • 2
    *That* ^ should be stated in the question... – Bob__ May 09 '18 at 20:45
  • 1
    So 2 numbers equal is ok but all of them not? What else we do not know? – Slava May 09 '18 at 20:47
  • n=250, say n=256bits, which is 32bytes, with 6bits set for selection. 32bytes times 3*10^11 is roughly 10^13 bytes, 10 * 10^12, 10 Tb of data. Actually, you could easily just generate and store them all, and then deal with position in the file (or pointer if you prefer memory mapped files) – Severin Pappadeux May 09 '18 at 21:05
  • Where should I store them all? my current matlab solution stores 5 GB at each iteration (RAM limitation), and then re-calculate new 5 GB at next iteration. This works quite well, but I wonder if there are faster ways to do it. – proczell May 09 '18 at 21:26
  • "I am using the combinations for some calculations later on" -> This is relevant. You should ask for a way to generate these combinations on the fly, so your "calculations later on" can have access to these combinations without having to actually store them. It makes no sense to use petabytes of storage for this. – Cris Luengo May 10 '18 at 05:09
  • This is exactly what I am trying to say in my other comments and original post. – proczell May 10 '18 at 07:25

2 Answers2

3

I think the biggest problem you're going to encounter is not calculation but disk write speed or memory size. By the way, it seems you wrongly determined number of combinations for n = 250 and k = 6. Did you use uint64_t? My number is 244 140 625 000 000.

So for this number you need ~1.4 Petabyte (~1400 Tb) of memory. This is your main problem. If you have that much big hard drive, you'd better use memory mapping, when write. You may consider using several threads to write: each will write its own chunk of memory.

So, I think you should think of other ways for providing combinations for solving your actual goal.

A naive solution. Change std::ofstream with memory mapped object.

int main()
{
    const constexpr uint8_t N = 250;
    const constexpr uint8_t K = 6;
    const constexpr uint64_t CombinationsCount = std::pow(N, K);
    using TCombination = std::array<uint8_t, K>;

    std::cout << CombinationsCount << std::endl;

    std::ofstream file("output.txt");
    TCombination c;
    for (uint64_t i = 0; i < CombinationsCount; ++i)
    {
        auto I = i;
        for (auto j = 0; j < K; ++j)
        {
            c[j] = I % N;
            I /= N;
            file << (int)c[j];
        }
        file << std::endl;
    }

}

If you want to use threads, just divide CombinationsCount with cores number and give each thread a task to write from specific address of memory (offset).

You asked for a function-like solution. You can pass different names of files and use different threads. Buy you still need to use memory mapping.

const constexpr uint8_t N = 250;
const constexpr uint8_t K = 6;
const constexpr uint64_t CombinationsCount = std::pow(N, K);
using TCombination = std::array<uint8_t, K>;

void Generate(uint64_t start, uint64_t size, const char* fileName)
{
    std::ofstream file(fileName);
    TCombination c;
    for (uint64_t i = start; i < start + size; ++i)
    {
        auto I = i;
        for (auto j = 0; j < K; ++j)
        {
            c[j] = I % N;
            I /= N;
            file << (int)c[j];
        }
        file << std::endl;
    }
}

int main()
{
    std::cout << CombinationsCount << std::endl;

    unsigned int threadsNum = std::thread::hardware_concurrency();

    std::vector<std::thread> workers;
    for (size_t i = 0; i < threadsNum; ++i)
        workers.emplace_back(
            Generate, 
            i * CombinationsCount / threadsNum,
            CombinationsCount / threadsNum,
            (std::string("output") + std::to_string(i)).c_str());

    for (size_t i = 0; i < threadsNum; ++i)
        workers[i].join();
}
NuPagadi
  • 1,410
  • 1
  • 11
  • 31
  • 1
    I used Matlab for a quick solution, but I may have exceeded the maximum limit? As I stated in my other comment, I use a method of chopping the problem up into smaller files (generate combinations for 5GB at each time). This is somewhat feasible for k = 5, but practically impossible for k=6 due to the fact that each 5GB takes about 70 mins (on a pretty crappy computer) to generate and write to file. Therefore, I was hoping that a C/C++ implementation existed that evaluated the combinations in a shorter time. – proczell May 09 '18 at 21:27
  • Check with documentation the maximum. 32 bits is not enough. 64 is ok. – NuPagadi May 09 '18 at 21:29
  • @proczell `C++` is much better for this purpose. As I said use memory mapping for disk writing and may be threads. I'll update my answer with simple solution. You should change `ofstream` with memory mapped object. And I still encourage you to consider some other ways in couple with your actual goal. Or you really have so much memory? – NuPagadi May 09 '18 at 21:39
  • Yes, this looks very much like something I can use. My next question is then if I want to chop it into several function calls. Could your code be altered to say start at combination: 2 13 54 110 214 and count the following 1e7 combinations? – proczell May 09 '18 at 21:46
  • @proczell Sure. Move `for` loop into function, which takes start (`i`) and end (`CombinationsCount`) and a pointer where to write. There is no much sense in modifying my example. Anyway you should deal with memory mapping. – NuPagadi May 09 '18 at 21:53
  • I see. Thank you, I will give this a try and hopefully it works just as intended. – proczell May 09 '18 at 21:54
2

I am working on an applied set-covering problem. In this research I want to generate all possible combinations. ... Does anyone know any algorithms that does this quickly in either C/C++/CUDA or Matlab?

There is no such thing as generating all possible combinations "quickly". That's incredibly slow by definition as n and k increase: n!/((n-k)!k!) rises faster than (k/e)^n , asymptotically as a function of n; so making your combinations generation faster by a constant factor by using a GPU will only let you increase n and/or k by a tiny bit.

Sorry for sounding preachy, but you probably need to do something other than trying to generate all combinations.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Of course this is not feasible for *too large* problems, but if I can increase the limit of *too large*, at least this method can be applied for more problems. – proczell May 09 '18 at 21:48
  • As you yourself observe, n and k will quickly make this infeasible. It's unlikely that you will get much traction from generating all those configurations. – einpoklum May 09 '18 at 22:39