0

I have been trying for some time now to come up with a way to compute all the various combinations of strings of words for some time now. Unlike most methods for combining on the web though, the algorithm must produce every combination, including those in which all the combined elements aren't in a single combination. ie, if I am combining 'Hello', 'New' and 'World', the combinations I am looking for are:

HelloNewWorld
HelloNew
HelloWorld
Hello
NewWorld
New
World

A professor from my college did come up with a quick and dirty solution for doing just that, but it is using nested for loops.

#include <iostream>
#include <vector>
#include <array>
#include <string>

int main()
{
std::vector<std::array<std::string, 2>> vec(3);
vec[0] = {"Hello", ""};
vec[1] = {"New", ""};
vec[2] = {"World", ""};
for (int i = 0; i < 2; i++)
    for (int j = 0; j < 2; j++)
        for (int k = 0; k < 2; k++)
            std::cout << vec[0][i] + vec[1][j] + vec[2][k] << std::endl;
}

As you might imagine, I desired a way to make this actually somewhat usable and portable. I know that this is possible with recursion, I just don't know how to implement it. Optimally, I would like to make this tail-recursive if at all possible, as the plan is to compute very large combinations. What would be the best way to do this recursively, and would it be easy to make tail-recursive?

  • Is one of the requirements that they must be words, or strictly combinations of characters? – M4rc Jul 07 '17 at 01:08
  • I highly recommend you use Howard Hinnant's work for creating efficient combination libraries in C++. They are recursive, but they are fast, portable, and easy to use. In this case, use `std::string s = "ab"; for (size_t i =0; i <= ab.size(); ++i) { for_each_combination(s.begin(), s.begin()+i, s.end(), f); }` https://howardhinnant.github.io/combinations.html – Alex Huszagh Jul 07 '17 at 01:11
  • 1
    @M4rc The requirement is that they must be words. Sorry I should have said that in my post. – Andrew LeFevre Jul 07 '17 at 01:16
  • If you need a true N-level for loop (a cartesian product), there are good implementations available. Disclosure: I wrote this one: https://gist.github.com/Alexhuszagh/67efb078a82616ed07529ed97586646a – Alex Huszagh Jul 07 '17 at 01:18
  • @AlexanderHuszagh Howard Hinnat's combination libraries look very interesting, and your code looks good for what it does, but neither will give me the output I am looking for. Neither options provide combinations with the combined elements absent from some of the combinations. That's the best way I know how to describe it. I edited my post to hopefully clarify what I am looking for. – Andrew LeFevre Jul 07 '17 at 01:26
  • @AndrewLeFevre, they will if you use Howard Hinnant's library and a for loop. I'll write an answer. In this case, you want the combinations from k=1 to k=N. – Alex Huszagh Jul 07 '17 at 01:28

4 Answers4

2

You can do this pretty efficiently by using all combinations from k=1 to k=N for a vector of N elements. Using Howard Hinnant's library available here, you can use it fairly effectively. In my case, I've named the library sampling.h, which is the only external dependency and can be viewed in it's entirety here.

#include "sampling.h"
#include <iostream>
#include <vector>


/**
 *  This function can take any container that has a bidirectional
 *  iterator (std::list, std::deque, std::vector) that contains elements
 *  of type std::string or similar, that must implement an `operator+`
 *  and `operator<<` for printing.
 */
template <typename BiDirStringContainer>
void print_combinations(BiDirStringContainer& container)
{
    auto first = container.begin();
    auto last = container.end();
    for (size_t i = 1; i <= container.size(); ++i) {
        auto mid = first + i;
        for_each_combination(first, mid, last, [](auto f, auto l) {
            std::string w;
            for (; f != l; ++f) {
                w += *f;
            }
            std::cout << w << std::endl;
            return false;
        });
    }
}


int main(void)
{
    std::vector<std::string> words = {
        "Hello",
        "New",
        "World",
    };
    print_combinations(words);

    return 0;
}

Compiling this with the C++14 standard and running it outputs:

Hello
New
World
HelloNew
HelloWorld
NewWorld
HelloNewWorld

This is exactly what your post described. Since the lambda is a custom functor, and can store state, you can do whatever you would like with the combinations: store a copy, print them, etc.

This is dramatically faster than anything you can get in the standard library without major work, or from suggestions made for the standard library. For example, std::next_combination and std::next_permutation (the former was not included, but was suggested here). I highly suggest reading Howard Hinnant's entirely blog post: it is enlightening. The time complexity on his implementations, and brute speed beats most other suggestions. If you need high performance combinations or permutations, he's already done the work for you.

Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67
  • 1
    Thanks for posting the code. I very quickly skimmed through the blog post you mentioned earlier, and wrongly thought since the first few examples looked produce combinations in the classical sense, his library wouldn't fit my needs. But this library seems to be very versatile, thanks for mentioning it! I will definitely look into it more. Thanks for your help! – Andrew LeFevre Jul 07 '17 at 03:15
2

At each level it recurses both with and without the current word printing the result when it gets to the end of all the words:

#include <iostream>
#include <string>
#include <vector>

void recurse(std::vector<std::string> &values,size_t level,std::string str) {
  if (level<values.size()) {
    recurse(values,level+1,str+values[level]);
    recurse(values,level+1,str);
  } else {
    std::cout<<str<<"\n";
  }
}

int main(int argc, char*argv[]) {
  if (argc<2)
    std::cout<<argv[0]<<" <word> [<word> [...]]\n";
  else {
    std::vector<std::string> values;
    for(int i=1;i<argc;++i) {
      values.push_back(argv[i]);
    }
    recurse(values,0,"");
  }
  return 0;
}

Which, when run with ./a.out Hello New World produces:

HelloNewWorld
HelloNew
HelloWorld
Hello
NewWorld
New
World
Alex Huszagh
  • 13,272
  • 3
  • 39
  • 67
Jerry Jeremiah
  • 9,045
  • 2
  • 23
  • 32
  • If you replace taking the vector by value to taking it by reference, this solution works very well (and efficiently). – Alex Huszagh Jul 07 '17 at 02:01
  • In fact, for this specialized use case, although it has the same time complexity (I believe), it should have smaller coefficients, which a quick benchmark seemed to confirm on my compiler. – Alex Huszagh Jul 07 '17 at 02:14
  • 1
    @AlexanderHuszagh When I wrote the code the question said take a string and make combinations of characters so the function took a string. Then the question changed and I fiddled with my solution. But I missed that - thanks for the heads up. – Jerry Jeremiah Jul 07 '17 at 02:15
  • @JerryJeremiah This is great! It runs really efficiently, although it doesn't look to be tail recursive, do you use some other advanced method to achieve such performance? – Andrew LeFevre Jul 07 '17 at 03:07
  • @AndrewLeFevre Tail recursion doesn't necessarily make it faster - it just ensures you don't use all your stack space. Tail recursion optimization in C++ is unpredictable and can't generally be counted on. – Jerry Jeremiah Jul 07 '17 at 03:55
  • @AndrewLeFevre When compiled with gcc 7.1 and -03 on https://godbolt.org/ the resulting code doesn't have tail recursion but comes out to be 513 assembly instructions. The secret to fast code is to: put the data into as few cache lines as possible (contiguous memory like std::vector or std::string), put the code into as few cache lines as possible (less assembly instructions is faster), calling functions that are already in the cache (calling functions somewhere else in memory means a cache miss but recursion is ok), and make the branches predictable (less branches is therefore better) – Jerry Jeremiah Jul 07 '17 at 03:56
  • @AndrewLeFevre https://stackoverflow.com/questions/16699247/what-is-cache-friendly-code/16699282#16699282 https://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array/11227902#11227902 – Jerry Jeremiah Jul 07 '17 at 04:01
  • @AndrewLeFevre Perhaps my "unpredictable" comment about tail recursion is old knowledge. https://stackoverflow.com/questions/34125/which-if-any-c-compilers-do-tail-recursion-optimization But you were right in your comment - my code isn't written to be tail recursive anyway. I should try to fix that. – Jerry Jeremiah Jul 07 '17 at 04:04
0

If the only possibilities is for a word to appear or not appear, that makes two possibilities. So for n words you have 2^n combinations. So you just count through the 2^n numbers from 0 (including) to 2^n-1 (including) and map each bit to one word.

No recursion needed, just one counting for loop.

ndim
  • 35,870
  • 12
  • 47
  • 57
0

If i understand this correctly you want to generate all combinations of a string. in that case you can use a BFS along with a set and a queue to generate the combinations, I will try to explain.

Say your string is ABCD. You have a queue to which you add ABCD and a set to which you add ABCD now

while the queue is not empty
1) you pop the top element
2) you generate substrings of that popped element
  a) if that substring is not in the set add it to the queue

to generate substrings in step 2 you do the following

for(int i =0;i<string.length();i++)
{
  string substr1 = string.substr(0,i);
  string substr2 = string.substr(i,string.length()-1);
  string substring = substr1+substr2;
}

doing this on ABCD(the input string) will generate BCD,ACD and ABD and ABC. now add those 3 to the set and the queue

now, you add BCD,ACD and ABD to the set. Lets say BCD is queue.front(). You pop that and generate CD,BD and BC and add them to the set and queue. when you pop ACD next, you generate CD,AD and AC but now you wont add CD to the queue because it is in the set.

EDIT:

I see your issue, my answer works for a string but you can use the same principle on a vector<string> to generate all combinations ABCD would just beHello(A)World(B)...

PYA
  • 8,096
  • 3
  • 21
  • 38