3

I am trying to find the Time Complexity of this algorithm.

The iterative: algorithm produces all the bit-strings within a given Hamming distance, from the input bit-string. It generates all increasing sequences 0 <= a[0] < ... < a[dist-1] < strlen(num), and reverts bits at corresponding indices.

The vector a is supposed to keep indices for which bits have to be inverted. So if a contains the current index i, we print 1 instead of 0 and vice versa. Otherwise we print the bit as is (see else-part), as shown below:

// e.g. hamming("0000", 2);
void hamming(const char* num, size_t dist) {
    assert(dist > 0);
    vector<int> a(dist);
    size_t k = 0, n = strlen(num);
    a[k] = -1;
    while (true)
        if (++a[k] >= n)
            if (k == 0)
                return;
            else {
                --k;
                continue;
            }
        else
            if (k == dist - 1) {
                // this is an O(n) operation and will be called
                // (n choose dist) times, in total.
                print(num, a);
            }
            else {
                a[k+1] = a[k];
                ++k;
            }
}

What is the Time Complexity of this algorithm?


My attempt says:

dist * n + (n choose t) * n + 2

but this seems not to be true, consider the following examples, all with dist = 2:

len = 3, (3 choose 2) = 3 * O(n), 10 while iterations
len = 4, (4 choose 2) = 6 * O(n), 15 while iterations
len = 5, (5 choose 2) = 9 * O(n), 21 while iterations
len = 6, (6 choose 2) = 15 * O(n), 28 while iterations

Here are two representative runs (with the print to be happening at the start of the loop):

000, len = 3
k = 0, total_iter = 1
vector a = -1 0 
k = 1, total_iter = 2
vector a = 0 0 
Paid O(n)
k = 1, total_iter = 3
vector a = 0 1 
Paid O(n)
k = 1, total_iter = 4
vector a = 0 2 
k = 0, total_iter = 5
vector a = 0 3 
k = 1, total_iter = 6
vector a = 1 1 
Paid O(n)
k = 1, total_iter = 7
vector a = 1 2 
k = 0, total_iter = 8
vector a = 1 3 
k = 1, total_iter = 9
vector a = 2 2 
k = 0, total_iter = 10
vector a = 2 3 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gsamaras@pythagoras:~/Desktop/generate_bitStrings_HammDistanceT$ ./iter
0000, len = 4
k = 0, total_iter = 1
vector a = -1 0 
k = 1, total_iter = 2
vector a = 0 0 
Paid O(n)
k = 1, total_iter = 3
vector a = 0 1 
Paid O(n)
k = 1, total_iter = 4
vector a = 0 2 
Paid O(n)
k = 1, total_iter = 5
vector a = 0 3 
k = 0, total_iter = 6
vector a = 0 4 
k = 1, total_iter = 7
vector a = 1 1 
Paid O(n)
k = 1, total_iter = 8
vector a = 1 2 
Paid O(n)
k = 1, total_iter = 9
vector a = 1 3 
k = 0, total_iter = 10
vector a = 1 4 
k = 1, total_iter = 11
vector a = 2 2 
Paid O(n)
k = 1, total_iter = 12
vector a = 2 3 
k = 0, total_iter = 13
vector a = 2 4 
k = 1, total_iter = 14
vector a = 3 3 
k = 0, total_iter = 15
vector a = 3 4 
Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305

2 Answers2

3

The while loop is somewhat clever and subtle, and it's arguable that it's doing two different things (or even three if you count the initialisation of a). That's what's making your complexity calculations challenging, and it's also less efficient than it could be.

In the abstract, to incrementally compute the next set of indices from the current one, the idea is to find the last index, i, that's less than n-dist+i, increment it, and set the following indexes to a[i]+1, a[i]+2, and so on.

For example, if dist=5, n=11 and your indexes are:

0, 3, 5, 9, 10

Then 5 is the last value less than n-dist+i (because n-dist is 6, and 10=6+4, 9=6+3, but 5<6+2).

So we increment 5, and set the subsequent integers to get the set of indexes:

0, 3, 6, 7, 8

Now consider how your code runs, assuming k=4

0, 3, 5, 9, 10
  • a[k] + 1 is 11, so k becomes 3.
  • ++a[k] is 10, so a[k+1] becomes 10, and k becomes 4.
  • ++a[k] is 11, so k becomes 3.
  • ++a[k] is 11, so k becomes 2.
  • ++a[k] is 6, so a[k+1] becomes 6, and k becomes 3.
  • ++a[k] is 7, so a[k+1] becomes 7, and k becomes 4.
  • ++a[k] is 8, and we continue to call the print function.

This code is correct, but it's not efficient because k scuttles backwards and forwards as it's searching for the highest index that can be incremented without causing an overflow in the higher indices. In fact, if the highest index is j from the end, the code uses a non-linear number iterations of the while loop. You can easily demonstrate this yourself if you trace how many iterations of the while loop occur when n==dist for different values of n. There is exactly one line of output, but you'll see an O(2^n) growth in the number of iterations (in fact, you'll see 2^(n+1)-2 iterations).

This scuttling makes your code needlessly inefficient, and also hard to analyse.

Instead, you can write the code in a more direct way:

void hamming2(const char* num, size_t dist) {
    int a[dist];
    for (int i = 0; i < dist; i++) {
        a[i] = i;
    }
    size_t n = strlen(num);
    while (true) {
        print(num, a);
        int i;
        for (i = dist - 1; i >= 0; i--) {
            if (a[i] < n - dist + i) break;
        }
        if (i < 0) return;
        a[i]++;
        for (int j = i+1; j<dist; j++) a[j] = a[i] + j - i;
    }
}

Now, each time through the while loop produces a new set of indexes. The exact cost per iteration is not straightforward, but since print is O(n), and the remaining code in the while loop is at worst O(dist), the overall cost is O(N_INCR_SEQ(n, dist) * n), where N_INCR_SEQ(n, dist) is the number of increasing sequences of natural numbers < n of length dist. Someone in the comments provides a link that gives a formula for this.

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118
  • 1
    +1. Re: "N_INCR_SEQ(n, dist) is the number of increasing sequences of natural numbers < n of length dist": As the OP has already recognized, that number is (*n* [choose](https://en.m.wikipedia.org/wiki/Combination) *dist*). – ruakh Dec 11 '16 at 18:01
  • 1
    Incidentally, one implication of your analysis is that the OP could minimally fix the performance of his/her code by changing the line `if (++a[k] >= n)` to `if (++a[k] > n - dist + k)`. That would eliminate all of the unnecessary/unproductive back-and-forth passes. – ruakh Dec 11 '16 at 18:19
  • Indeed @ruakh, but I like Paul's code more, it seems more elegant to my young eye! So an overall time complexity of O((n choose dist) * n)... – gsamaras Dec 11 '16 at 20:32
  • @gsamaras: Yes, I also like Paul's code better. (Oh, except that `int a[dist]` isn't valid C++; C has variable-length arrays, so a few compilers support them in C++ as well as a compiler extension, but they're not part of the C++ standard. So I think you were right to use `std::vector` instead.) – ruakh Dec 11 '16 at 21:25
  • Yes, my code is C. The original code is clearly C++ because of the vector, but otherwise looks more like C with the C-style string and `strlen`. So I switched it to pure C (actually C99). – Paul Hankin Dec 11 '16 at 21:33
  • Yes this is [tag:c99], but that's minor guys. The essence of this answer is sweet, thanks Paul! It's also gave me a grasp for my next question, [which compares the recursive and the iterative algorithm](http://stackoverflow.com/questions/41105621/trying-to-compare-a-recursive-and-iterative-algorithm) for this problem; if you want take a look! @ruakh – gsamaras Dec 12 '16 at 17:07
  • @ruakh Just a minor disclaimer. I posted the [original code](http://stackoverflow.com/a/40836241/3246555) simply to demonstrate a non-recursive solution. I'm aware of the optimization you mentioned, but did not think it was important. Still, it is much better to have it, so thanks a lot! I updated my original post. I like Paul's algorithm, but it seems that it would cost `O(dist)` for one iteration _even if `print` would be missing_. – AlexD Dec 13 '16 at 23:20
  • 1
    @AlexD `hamming("<40 zeros>", 40)` shows the optimisation is important. My version of the code actually does the same amount of work as yours (assuming ruakh's change above in the second comment to remove the excessive scuttling). Mine does more work per iteration, but yours can do many iterations between print statements. Both versions of the code balance exactly since they're doing essentially the same thing. – Paul Hankin Dec 14 '16 at 00:02
  • @PaulHankin Sure, sure, I agree! I (mistakenly) assumed that it was for reasonably limited numbers. But even if so, I had to use `if (++a[k] > n - dist + k)` at the first place. And yes, linear cost of iteration in your case is balanced by the number of iterations in my case. So please ignore the _but_-part in the "_I like Paul's algorithm, but_" comment :). – AlexD Dec 14 '16 at 00:27
1

Notice, that given n which represents the length, and t which represents the distance required, the number of increasing, non-negative series of t integers between 1 and n (or in indices form, between 0 and n-1) is indeed n choose t, since we pick t distinct indices.

The problem occurs with your generation of those series:

-First, notice that for example in the case of length 4, you actually go over 5 different indices, 0 to 4.

-Secondly, notice that you are taking in account series with identical indices (in the case of t=2, its 0 0, 1 1, 2 2 and so on), and generally, you would go through every non-decreasing series, instead of through every increasing series.

So for calculating the TC of your program, make sure you take that into account.

Hint: try to make one-to-one correspondence from the universe of those series, to the universe of integer solutions to some equation.

If you need the direct solution, take a look here : https://math.stackexchange.com/questions/432496/number-of-non-decreasing-sequences-of-length-m


The final solution is (n+t-1) choose (t), but noticing the first bullet, in your program, its actually ((n+1)+t-1) choose (t), since you loop with one extra index. Denote

((n+1)+t-1) choose (t) =: A , n choose t =: B

overall we get O(1) + B*O(n) + (A-B)*O(1)

Community
  • 1
  • 1
Eliran Abdoo
  • 611
  • 6
  • 17
  • I don't think your "Secondly" paragraph is correct. Note that the `while`-loop starts by unconditionally incrementing `a[k]`. (For some reason this has been obfuscated by tucking it into an `if`-test; but it's there.) – ruakh Dec 11 '16 at 18:21
  • @ruakh Looking at his run example, I figured it does go through every non-decreasing series. Am I wrong? – Eliran Abdoo Dec 11 '16 at 18:38
  • His/her run example shows the contents of the vector at the *start* of each loop iteration; but the vector is modified during the loop iteration. But, it's complicated. The vector's contents can never be like `1 1 1` (with the same value repeated three times), but they can temporarily be like `1 1 4` or `1 2 2`. – ruakh Dec 11 '16 at 19:09
  • So, overall, you too agree that the algorithm I have is slower than the algorithm provided by Paul, thanks, +1. – gsamaras Dec 11 '16 at 20:38