20

There is a test with N YES or NO questions. You can write the test and the professor will tell you how many of your answers are correct. What is the fastest way to pass the test? I.e. give correct answers to all questions with the minimum number of trials.

UPD The solution with N+1 trials is obvious. On each trial we will have correct answer to a single question. It is 1 bit of information. But professor gives us a number from 0 to N each time, it is log2(N + 1) bits of information. That's why the best solution has O(N / log(N)) complexity. I'm looking for any solution with sub-linear worst time complexity.

Mikhail
  • 20,685
  • 7
  • 70
  • 146
  • 1
    basically a `probability` question---don't you think that it fits math.stackexchange.com! – Am_I_Helpful Jun 27 '14 at 10:26
  • 2
    You may be right, but I think it also perfectly fits `algorithm` tag :) – Mikhail Jun 27 '14 at 10:29
  • The fasted way is to make a good guess the first time you try ;-) – Henry Jun 27 '14 at 10:44
  • 6
    Passing that test sounds like playing [Mastermind](http://en.wikipedia.org/wiki/Mastermind_(board_game) "Wikipedia article"). (Make sure to follow the link, the article includes an 'algorithm' section.) – stakx - no longer contributing Jun 27 '14 at 10:54
  • 1
    @DanielDaranas It is not probability, there should be a strict algorithm with worst performance case and valid complexity. – Mikhail Jun 27 '14 at 11:18
  • 1
    @Mikhail: I know your information content calculation is just a sketch, but... not all answers contain the same amount of information. For example if the professors says 0 or N, you have immediately the solution. N/2 on the other hand doesn't contain much information... – Karoly Horvath Jun 27 '14 at 11:26
  • I would guess that it's hard to get the worst complexity better then O(n). My intuition is that he can force you into N+1 trials if he can decide on the fly which answers are correct. However when the answers are guaranteed to be random you might take advantage of it. – Kuba Jun 27 '14 at 11:27
  • 7
    Erdös and Rényi did a take on that problem in 1963: https://www.renyi.hu/~p_erdos/1963-12.pdf – Carsten Jun 27 '14 at 11:54
  • 1
    This is a fitting question for the algorithm tag. If you aren't interested in algorithms, don't read this tag. I wish it were possible to vote against close votes, rather than the current biased system. – Colonel Panic Jun 27 '14 at 13:16
  • 'But professor gives us a number from 0 to N each time, it is log2(N + 1) bits of information' beware! one answer can have this many bits, but with two answers, there is some (possible) amount of repeated information – josinalvo Jun 27 '14 at 13:30
  • I'm pretty sure O(N) is the best you can get. Whenever you change more than one unknown, there's always a chance that 1 item will be turned correct and 1 item will be turned incorrect, meaning you just forced an extra step into the process to find out which one is wrong. This will always force any algorithm to descend to the level of individual elements. It's possible to get lucky halfway, but the worst case is always going to be O(N) at best because of this. – Aberrant Jun 27 '14 at 13:30
  • @ColonelPanic Well, you can always vote to reopen. – Carsten Jun 27 '14 at 14:13
  • @Aberrant In the paper posted by @Carsten, Erdös proved that the maximum number of trials is `Θ(N / log(N))`. – Mikhail Jun 28 '14 at 12:09
  • @Carsten Your link along with the conclusions made at the very last page should be an accepted answer! – Mikhail Jun 28 '14 at 12:09

6 Answers6

5

The obvious improvement over the N+1 solution:

Start with all Y answers.

Then we know exactly how much yes/no there are.

Let p be the probability of having yes on any given position. p >= 1/2 without loss of generality.

Then I will reveal two first answers in average of 2 - p^2 tries.

I change my answer for the two first questions. At least p^2 of the time I will know the exact answer for both of them. If not - then I at least know that one of them is Y and the other is N and I need to ask one more question.

So in the worst case of p = 1/2 I need 1 + N * 7/8.

Kuba
  • 2,986
  • 2
  • 26
  • 32
  • Yes. And it's only with the assumption that the professor is not changing the answers during the trials. Otherwise he could always return yes/no for any pair I choose. – Kuba Jun 27 '14 at 11:58
  • Of course professor is not changing the answers, it is we who are cheating :) – Mikhail Jun 27 '14 at 13:31
3

Disclaimer: I don't know whether this is the fastest way. I'm sure that in specific scenarious you can get away with fewer trials, but this could be a strict upper bound (worst case scenario).

Fill the first trial round however you like, just remember your choices. If you prefer, you can pick No for all of them.

In the next trial, change only the first answer (following the example: pick Yes). Based on the change in the answer, you will know the correct answer for the first question (if the result increases, you gave a correct answer if not, an incorrect one).

Now change only the second one, and so on.

You need N+1 trails.

Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
3

Here is another approach (divide and conquer). for simplicity, I will refer to a particular value of N, but the generalization is straightforward.

Let N=10 and lets assume that at the first round (trial=1) we answer 5 questions correctly. This is the most likely outcome, and the one that reveals the less amount of information as has been noticed.

Then the following logic allows us to not check every single number:

  • Divide the list of answers in two sets, 1...5, 6...10. Given we have 5 correct answers, the possible correct answers for each set are

    (0,5)
    (1,4)
    (2,3)
    (3,2)
    (4,1)
    (5,0)
    

Now, for the next trial flip answers 1...5. Then the above states change as follows:

    (0,5)--> (5,5)   -- 10 correct
    (1,4)--> (4,4)   -- 8 correct
    (2,3)--> (3,3)   -- 6 correct
    (3,2)--> (2,2)   -- 4 correct
    (4,1)--> (1,1)   -- 2 correct
    (5,0)--> (0,0)   -- 0 correct

If the teacher says 10 or 0 we are done or we need one more trial respectively. In any case, depending on the number that the teacher says we can know how many correct answers we have in each interval. Then we can decide.

  • 2 Correct: We flip back (backtrack) the first 5 answers and we know that we have (4,1) correct in answers 1...5 and 6...10 respectively. So we also flip 6...10 and get to 8 correct, see below.
  • 4 Correct: As in 2 correct, we flip back the first 5 answers and also flip 6...10 and get to 6 correct, see below.
  • 6 Correct: We need to divide further and iterate. We need at most 3 more steps for each of 1...5 and 6..10, so a total of 8 steps, including the first two steps.
  • 8 Correct: We again apply binary search. we divide each of the two initial sets (for instance 1...5 is divided in 1...3, 4...5) and look for the wrong answer. If it is in 4...5we need two steps, else we need three steps. Therefore, again a total of 8 steps.
Ioannis
  • 5,238
  • 2
  • 19
  • 31
3

In the article from Carsten's comment, Erdös and Rényi show an example of how the problem can be formulated as finding the smallest number of test sequences that together can generate a unique hash for the unknown sequence. Since their example shows a sequence five digits long solved with four tests, I tried to come up with a sub-linear number of tests for sequences of length six and seven.

Looking at Erdös and Rényi's example and inspired by Ioannis' mention of "divide and conquer," I thought perhaps the test sequences kind of divide and then subdivide the sequence. It took a few tries to get working tests for sequence length seven.

Perhaps one way to think about the algorithm you are asking for could be a way to generalize / automate the generation of these test sequences.

The JavaScript program below compares test-sequences against all sequences of a given length, hashing the number of coinciding digits. If two different sequences generate the same hash, the program notifies that a match was found, meaning this combination of tests would not work. If no match was found, it means the hashes are unique and the tests ought to work.

// http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
function setBits(i)
{
     i = i - ((i >> 1) & 0x55555555);
     i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
     return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

// http://stackoverflow.com/questions/10073699/pad-a-number-with-leading-zeros-in-javascript
function pad(n, width, z) {
  z = z || '0';
  n = n + '';
  return n.length >= width ? n : new Array(width - n.length + 1).join(z) + n;
}

function numCoincidences(a,b){
    var n = 0
    for (var i=0; i<a.length; i++){
        if (a.charAt(i) == b.charAt(i)){
            n ++
        }
    }
    return n
}

var sequenceLength = 6

var tests = [
        "111111",
        "111000",
        "010010",
        "011001",
        "000100"
    ]

/***
var sequenceLength = 7

var tests = [
    "1111111",
    "1111000",
    "0100100",
    "0110010",
    "0110001",
    "0001000"
]
***/

var hash = {}

console.log("       " + tests.join(" "))

for (var i=0; i<1<<sequenceLength; i++){
    if (setBits(i) < Math.floor(sequenceLength / 2)){
      var tmp = pad(i.toString(2),sequenceLength)
      var h = ""
      for (var j in tests){
        h += numCoincidences(tests[j],tmp)
      }
      console.log(tmp + "   " + h.split("").join("      "))
      if (hash[h]){
          console.log("found match")
      } else {
          hash[h] = true
      }

    }
}

console.log("done")

Output:

"       111111 111000 010010 011001 000100" <-- test sequences
"000000   0      3      4      3      5"
"000001   1      2      3      4      4"    <-- sequences to match, followed by
"000010   1      2      5      2      4"             the number of coincidences 
"000011   2      1      4      3      3" 
"000100   1      2      3      2      6" 
"000101   2      1      2      3      5" 
"000110   2      1      4      1      5" 
"000111   3      0      3      2      4" 
"001000   1      4      3      4      4" 
"001001   2      3      2      5      3" 
"001010   2      3      4      3      3" 
"001011   3      2      3      4      2" 
"001100   2      3      2      3      5" 
"001101   3      2      1      4      4" 
"001110   3      2      3      2      4" 
"010000   1      4      5      4      4" 
"010001   2      3      4      5      3" 
"010010   2      3      6      3      3" 
"010011   3      2      5      4      2" 
"010100   2      3      4      3      5" 
"010101   3      2      3      4      4" 
"010110   3      2      5      2      4" 
"011000   2      5      4      5      3" 
"011001   3      4      3      6      2" 
"011010   3      4      5      4      2" 
"011100   3      4      3      4      4" 
"100000   1      4      3      2      4" 
"100001   2      3      2      3      3" 
"100010   2      3      4      1      3" 
"100011   3      2      3      2      2" 
"100100   2      3      2      1      5" 
"100101   3      2      1      2      4" 
"100110   3      2      3      0      4" 
"101000   2      5      2      3      3" 
"101001   3      4      1      4      2" 
"101010   3      4      3      2      2" 
"101100   3      4      1      2      4" 
"110000   2      5      4      3      3" 
"110001   3      4      3      4      2" 
"110010   3      4      5      2      2" 
"110100   3      4      3      2      4" 
"111000   3      6      3      4      2" 
"done"
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • 1. I thought Erdös is dealing with arbitrary `N`? 2. You claim that the given sequence of 5 trials will solve the test with `N=6`, because all rows in the table are different, right? 3. How did you find this sequene? Brute force? 4. How long will that sequence be for arbitrary `N`? – Mikhail Jun 30 '14 at 04:53
  • @Mikhail The math in Erdös and Rényi's Article is too advanced for me. I assumed that their function, `B(n)` - the number of test-sequences needed, would decrease in relation to `n` as `n` grows large, but I could be wrong and for some arbitrary large `n` it could again be equal to `n`. They seeem to prove `B(n) <= n`, generally; and if I understand correctly, `B(n)` tends to `2n / log n` for sufficiently large `n`. – גלעד ברקן Jun 30 '14 at 16:07
  • @Mikhail 2. Yes, I claim that the given sequence of 5 trials will solve the test with N=6 because all rows in the table are different - clearly we have a unique hash for each unknown sequence of length six. 3. I found the tests by trial and error - I describe my thinking in the second paragraph of my answer. – גלעד ברקן Jun 30 '14 at 16:11
1

The naive solution would involve O(N) attempts: start with all answers YES, then in every ith try flip the ith answer. If your score increased, keep it; if not, flip it back. Increment i, repeat.

A more efficient solution might involve a very simple genetic algorithm, where the heuristic is the professor's answer, and mutation might be equivalent to simply flipping all the answers. This would probably approach O(log N) tries, but of course the multiplicative constant would be larger (by at least an order of magnitude, if I had to guess), so it would only be feasible for large N.

More Axes
  • 249
  • 3
  • 12
1

Some Python code for a trivial, O(n) algorithm:

import random

def ask_prof(A, M): return sum(x == y for x, y in zip(M, A)) 

N = 10
A = [ random.randint(0, 1) for x in range(N) ]
K, s = [], 0
for trial in range(N):
    M = K + [ 0 for x in range(N - len(K)) ]
    s1 = ask_prof(A, M)
    M[len(K)] = 1 
    s2 = ask_prof(A, M)
    if s1 < s2: K.append(1)
    else: K.append(0)
print 'answers are', K
perreal
  • 94,503
  • 21
  • 155
  • 181