1275

I had an interesting job interview experience a while back. The question started really easy:

Q1: We have a bag containing numbers 1, 2, 3, …, 100. Each number appears exactly once, so there are 100 numbers. Now one number is randomly picked out of the bag. Find the missing number.

I've heard this interview question before, of course, so I very quickly answered along the lines of:

A1: Well, the sum of the numbers 1 + 2 + 3 + … + N is (N+1)(N/2) (see Wikipedia: sum of arithmetic series). For N = 100, the sum is 5050.

Thus, if all numbers are present in the bag, the sum will be exactly 5050. Since one number is missing, the sum will be less than this, and the difference is that number. So we can find that missing number in O(N) time and O(1) space.

At this point I thought I had done well, but all of a sudden the question took an unexpected turn:

Q2: That is correct, but now how would you do this if TWO numbers are missing?

I had never seen/heard/considered this variation before, so I panicked and couldn't answer the question. The interviewer insisted on knowing my thought process, so I mentioned that perhaps we can get more information by comparing against the expected product, or perhaps doing a second pass after having gathered some information from the first pass, etc, but I really was just shooting in the dark rather than actually having a clear path to the solution.

The interviewer did try to encourage me by saying that having a second equation is indeed one way to solve the problem. At this point I was kind of upset (for not knowing the answer before hand), and asked if this is a general (read: "useful") programming technique, or if it's just a trick/gotcha answer.

The interviewer's answer surprised me: you can generalize the technique to find 3 missing numbers. In fact, you can generalize it to find k missing numbers.

Qk: If exactly k numbers are missing from the bag, how would you find it efficiently?

This was a few months ago, and I still couldn't figure out what this technique is. Obviously there's a Ω(N) time lower bound since we must scan all the numbers at least once, but the interviewer insisted that the TIME and SPACE complexity of the solving technique (minus the O(N) time input scan) is defined in k not N.

So the question here is simple:

  • How would you solve Q2?
  • How would you solve Q3?
  • How would you solve Qk?

Clarifications

  • Generally there are N numbers from 1..N, not just 1..100.
  • I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the presence/absence each number by the value of a designated bit, therefore using O(N) bits in additional space. We can't afford any additional space proportional to N.
  • I'm also not looking for the obvious sort-first approach. This and the set-based approach are worth mentioning in an interview (they are easy to implement, and depending on N, can be very practical). I'm looking for the Holy Grail solution (which may or may not be practical to implement, but has the desired asymptotic characteristics nevertheless).

So again, of course you must scan the input in O(N), but you can only capture small amount of information (defined in terms of k not N), and must then find the k missing numbers somehow.

Gulzar
  • 23,452
  • 27
  • 113
  • 201
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 2
    Note that the run-time can't be independent of `k`. If you keep track of just `k` pieces of extra information, you need `O(k N)` time to update each extra piece whenever you touch a new number from the input. Also, the solution with the sum already keeps track of `2 log N` bits. – Heinrich Apfelmus Aug 16 '10 at 12:40
  • 10
    @polygenelubricants Thank You for the clarifications. "I'm looking for an algorithm that uses O(N) time and O(K) space where K is the count of absent numbers" would have been clear from the beginning on ;-) – Dave O. Aug 16 '10 at 17:29
  • 8
    You should precise, in the statement of Q1 that you cannot access the numbers in order. This probably seems obvious to you, but I've never heard of the question and the term "bag" (which means "multiset" as well) was sort of confusing. – Jérémie Aug 16 '10 at 18:44
  • 8
    Please read the following as the answers provided here are ridiculous: http://stackoverflow.com/questions/4406110/missing-numbers-interview-question-redux –  Dec 26 '10 at 09:10
  • Use the first number's memory for a bit_set! :-) You need space relative to N, as all the N numbers may be missing! (empty set) – CMR Nov 16 '11 at 18:22
  • 27
    The solution of summing the numbers requires log(N) space unless you consider the space requirement for an unbounded integer to be O(1). But if you allow for unbounded integers, then you have as much space as you want with just one integer. – Udo Klein Apr 10 '13 at 05:53
  • 1
    As @UdoKlein stated, you require log(N) space for big N. For the same reason you also need O(log(N)*N) time for the summation. – Chronial Jul 17 '13 at 17:07
  • 1
    are the numbers in sequence? if it were like 1, 2, *, 4, 5, * and N = 6, then iterate i -> 1 to N and increment i. Each time you don't find match that number is missing.. I am guessing it was unsorted right? else it wouldn't be a challenge. – David Prun Sep 29 '14 at 05:00
  • 12
    By the way pretty nice alternative solution for Q1 could be computing `XOR` of all numbers from `1` to `n`, then xoring result with all numbers in the given array. In the end you have your missing number. In this solution you don't need to care about overflow as in summing up. – sbeliakov Sep 23 '15 at 15:32
  • 1
    I think the following approach works in general. (The caveat is that it essentially encodes the bitset as a huge number. :-) Let `a=sum_{s in S} 2^s`. Now infer the missing numbers as indexes of turned-off bits. Both can be done in linear time; the big integer of course takes a lot of space (so it's basically cheating). – blazs Feb 25 '16 at 15:48
  • 2
    For Q2 and Qk, I believe you need to return a set of 2 or k terms which add up as the difference of the sum of the numbers of the range. For eg: for range of 1 to 100 and there are 2 missing numbers, and the sum of the numbers is 5040, then the set of possible numbers that are missing are either { (1,9) , (2,8) , (3,7) , (4,6) } The same would imply for 'k' missing numbers as well. – Krish Munot Jul 09 '16 at 15:59
  • 1
    Bonus question: what is the best space complexity that can be achieved with unlimited time (and vice versa) assuming 2 numbers are missing? – Nickpick Aug 18 '16 at 17:33
  • 4
    It is an artificial question. The bag itself already consumes O(N) space, using a bit array to keep track of the elements in the bag would not make this worse. – toongeorges Jul 10 '17 at 12:51
  • A bag?? Well assuming it's a real bag with real balls and I must go through it then I'll grab a paper and a pencil and put marks on 10x10 grid as I go through each ball by hand, I'm not gonna start doing polynomials or XORing the bits of each number in my head am I? And wow guess what it scales up to any number of missing balls. I'm not really missing the point of the question if the question is poorly thought up. – Michel Rouzic Jan 24 '19 at 14:22
  • Dumb question but wouldn't just feeding the list into a Python set and checking if i is in the set for i=1...100 (n=100) be amortized O(n) too? – xjcl Apr 08 '19 at 22:53
  • Since Fibonacci Heaps have O(1) insertion and O(log n) deletion shouldn't they admit an O(n + k log n) or O(n + k log k) solution? – xjcl Apr 08 '19 at 23:51
  • What kind of programming job needs this kind of skill with inventing or knowing algorithms? Are interviewers testing for skills needed in the position, or are they testing to make sure the candidate went through a CS program like the interviewer did? – Wayne Conrad Dec 13 '22 at 17:41

49 Answers49

636

Here's a summary of Dimitris Andreou's link.

Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations

a1 + a2 + ... + ak = b1

a12 + a22 + ... + ak2 = b2

...

a1k + a2k + ... + akk = bk

Using Newton's identities, knowing bi allows to compute

c1 = a1 + a2 + ... ak

c2 = a1a2 + a1a3 + ... + ak-1ak

...

ck = a1a2 ... ak

If you expand the polynomial (x-a1)...(x-ak) the coefficients will be exactly c1, ..., ck - see Viète's formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means ai are uniquely determined, up to permutation.

This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.

However, when k is varying, the direct approach of computing c1,...,ck is prohibitely expensive, since e.g. ck is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Zq field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate. The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.

High level pseudocode for constant k:

  • Compute i-th powers of given numbers
  • Subtract to get sums of i-th powers of unknown numbers. Call the sums bi.
  • Use Newton's identities to compute coefficients from bi; call them ci. Basically, c1 = b1; c2 = (c1b1 - b2)/2; see Wikipedia for exact formulas
  • Factor the polynomial xk-c1xk-1 + ... + ck.
  • The roots of the polynomial are the needed numbers a1, ..., ak.

For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.

EDIT: The previous version of this answer stated that instead of Zq, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton's formulas require division by numbers up to k.

sdcvvc
  • 25,343
  • 4
  • 66
  • 102
  • 6
    You don't have to use a prime field, you can also use `q = 2^(log n)`. (How did you make the super- and subscripts?!) – Heinrich Apfelmus Aug 16 '10 at 12:45
  • 3
    Also, you can calculate the c_k on the fly, without using the power sums, thanks to the formula $c^{k+1}_m = c^k_{m+1} + c^k_m x_{k+1}$ where the superscript $k$ denotes the number of variables and $m$ the degree of the symmetric polynomial. – Heinrich Apfelmus Aug 16 '10 at 12:50
  • @Heinrich Apfelmus Thanks! I'm not sure about the power of two approach - in general factorization would be much tricker, since there are divisors of zero and no uniqueness. To make sub/superscripts in answers, HTML tags can be used, in comments I think it is not available. – sdcvvc Aug 16 '10 at 12:59
  • 62
    +1 This is really, really clever. At the same time, it's questionable, whether it's really worth the effort, or whether (parts of) this solution to a quite artificial problem can be reused in another way. And even if this were a real world problem, on many platforms the most trivial `O(N^2)` solution will probably possibly outperform this beauty for even reasonably high `N`. Makes me think of this: http://tinyurl.com/c8fwgw Nonetheless, great work! I wouldn't have had the patience to crawl through all the math :) – back2dos Aug 16 '10 at 13:52
  • 1
    @sdcvvc Ah, I mean the [finite field](http://en.wikipedia.org/wiki/Finite_field) with `q=2^(log n)` elements, not the ring of bit vectors with `log n` bits. Being a field, the former doesn't have any zero divisors; of course, multiplication is a bit more complicated than a simple bit-wise AND. – Heinrich Apfelmus Aug 16 '10 at 13:57
  • 1
    @Heinrich Apfelmus: Thanks, added. @back2dos: The book in Dimitris's answer mentions applications of these techniques in communication complexity (set reconcilation). – sdcvvc Aug 16 '10 at 14:12
  • 215
    I think this is a wonderful answer. I think this also illustrates how poor of an interview question it would be to extend the missing numbers beyond one. Even the first is kind of a gotchya, but it's common enough that it basically shows "you did some interview prep." But to expect a CS major to know go beyond k=1 (especially "on the spot" in an interview) is a bit silly. – corsiKa Mar 25 '11 at 21:03
  • 2
    How exactly do you factor a degree-k polynomial quickly? – Nemo Nov 09 '12 at 04:57
  • Nemo: I linked the algorithms. Note that for purpose of this question, k is a constant, so you don't need a fast algorithm, only one whose complexity does not depend on n. – sdcvvc Nov 11 '12 at 11:10
  • 7
    This is effectively doing Reed Solomon coding on the input. – David Ehrmann Mar 02 '14 at 17:26
  • 108
    I bet entering all number in a `hash set` and iterating over the `1...N` suite using lookups to determine if numbers are missing, would be the most generic, fastest in average regarding `k` variations, most debuggable most maintainable and understandable solution. Of course the math way is impressive but somewhere along the way you need to be an engineer and not a mathematician. Especially when business is involved. – v.oddou Apr 03 '14 at 07:54
  • 1
    While I find it wonderful I wonder if there are any reason to ask this also test logical thinking. I mean, is there a real use of it? – Jack Aug 26 '14 at 13:05
  • It would be helpful to have a concretely example how the pseudo code works because I am still a little bit confused. – Steven Dec 04 '14 at 18:40
  • This can be solved as a tiny one or two liner in python, you are you over-engeneering this. – Caridorc Feb 28 '15 at 17:10
  • 5
    @Caridorc: That one-liner will not satisfy the requirements of the question, which asks for space complexity independent of N. – sdcvvc Mar 01 '15 at 18:51
  • 1
    It is theoretically nice, but practically I guess it would be faster to make just array of bits and find the missing number just by settings bits. Furthermore restriction to integers range does not allow to compute such big power(x,i) for bigger numbers. Am I right or not? – Tomas Kubes Mar 21 '15 at 06:34
  • 1
    @qub1n: It would be faster, but the point of the question is an algorithm which does not use memory proportional to N (see "Clarifications"). As for integer range, I am doing computation modulo some prime q, which means numbers will not get huge. – sdcvvc Mar 24 '15 at 08:41
  • 2
    You don't need to factor the resulting polynomial; just divide it in succession by `(x-i)` for `i=1..100`. If you get a non-zero remainder, `(x-i)` is not a factor. – Edward Doolittle Apr 14 '15 at 23:00
  • In fact, you don't even need to divide it by `(x-i)`. Just substitute `i` into the polynomial. The same argument works for `mod p`, where `p` is a prime greater than `100` or whatever is the largest number on the list. The calculations are entirely straightforward and fast ... I don't know why anyone would solve this problem in another way. – Edward Doolittle Apr 21 '15 at 05:02
  • 1
    @EdwardDoolittle Trying each `i` takes Omega(N) time, while the question asked that after scanning the input the time/space complexity should be independent of N. (Technically, there is a log N factor because we are computing with integers from range [0..N] (or similar) but it's usually disregarded in the RAM model.) – sdcvvc Apr 21 '15 at 18:51
  • @sdcvvc Gosh ... you're right. I should read more carefully. I see no alternative to factoring the polynomial. Thanks. – Edward Doolittle Apr 21 '15 at 19:45
  • The easiest way is to hijack the `takeout()` or `add()` method and keep a record of what's out/in. It won't change the complexity of `takeout()`/`add()` but reduce this "K missing" problem to O(1). – user3528438 Oct 12 '15 at 01:59
  • 2
    Definitely memorize this solution and if you get this question in an interview pretend you're making up the solution on the spot. Then tell us what question you get next. – Aurast Mar 29 '16 at 21:00
  • @DavidEhrmann Can you expand on how one might think of this as a coding theoretic problem? – Thomas Ahle Apr 26 '16 at 13:25
  • 1
    @sdcvvc So if we use a finite field, does this only use `k log n` bits of memory? That seems better than the `k^2 logn` memory described in the article as optimal. If we just use the natural numbers, each power-sum can be up to `n^k` and thus `k` of them takes `k^2 logn` memory. This is similar to what you get randomized with hashing, http://stackoverflow.com/a/36851791/205521 but I wonder if this solution is more efficient? – Thomas Ahle Apr 26 '16 at 13:28
  • 2
    @v.oddou: Or even `missing_numbers = set(range(100)) - bag` - voila! – Claudiu Apr 26 '16 at 16:16
  • 2
    How do they expect someone to give this answer on the spot in an interview? – David G May 09 '16 at 18:18
  • 2
    @0x499602D2: Presumably the guy who manages to give this answer on the spot gets the job. I've seen companies leaving highly specialist positions (device driver programmer, probability analyst etc.) open for years in search of the candidate that can do the exact types of problems they need solving. In such cases they're not looking for programmers, they're looking for mathematicians or physicists who can program. – slebetman Aug 29 '16 at 06:11
  • It would be great if @sdcvvc could provide a numeric solution for some small `N` and `k`, for example, take `N=5, k=3` and follow the computations. Thanks! – dma_k Sep 02 '16 at 08:51
  • @FilipHaglund No, this is supposed to be the sum of a_i * a_j where i < j. a1a2 + a2a3 + ... would suggest the cyclic sum, which changes if a_i is permuted (while the roots have no natural order) – sdcvvc Oct 17 '16 at 19:52
  • @sdcvvc then what's c3 and c4? I don't see the pattern. – Filip Haglund Oct 17 '16 at 20:18
  • 1
    @FilipHaglund in Python: c3 = `sum(a[x]*a[y]*a[z] for x in range(k) for y in range(x+1,k) for z in range(y+1,k))`, c4 = `sum(a[x]*a[y]*a[z]*a[t] for x in range(k) for y in range(x+1,k) for z in range(y+1,k) for t in range(z+1,k))` etc. – sdcvvc Oct 17 '16 at 22:35
  • Actually another solution for case k=2 computes the sum, decides from that whether the two missing numbers have the same or different parities. If they have different parities then compute the sum of odd - sum of even. Comparing with the true result this gives you the difference of the two numbers and hence combined with the sum you can find the two numbers uniquely. If the parities are both the same - both even or both odd then you can find out from the difference of odds and evens which this is. You then search only in this restricted set and repeat. – Ivan Jun 27 '17 at 11:29
  • Also, for k=2 see http://www.geeksforgeeks.org/find-two-missing-numbers-set-1-an-interesting-linear-time-solution/ – PlsWork Oct 21 '17 at 22:29
  • 1
    Whats the space complexity? – PlsWork Oct 21 '17 at 23:53
  • Hey we need O(N) space to keep the n `unique` number somewhere. **So how is it better than O(N) bit-mask ?** The bit-mask is the smallest space required to represent an unique value because each number can either appear or not(hence 2 possibility being 1 or 0). – KRoy Dec 17 '17 at 20:23
  • 1
    @shuva The algorithm is not remembering numbers that come in the stream. They come as input, the "bag" from the question is not part of the algorithm memory. – sdcvvc Jan 26 '18 at 22:31
  • 2
    @corsiKa This question was asked in the German final round of the selection process for the International Olympiad in Informatics ("Bundeswettbewerb Informatik"). So for people **before** they enter university. Out of ~30 students, I think all of us got a solution for k=1 and k=2. One of the four teams got the solution for arbitrary k, IIRC. Oh, and it was about streaming algorithms, meaning you were not allowed to loop over the input multiple times (which is the obvious solution to the question as asked above) – Martin Thoma Jun 15 '18 at 21:32
  • 1
    @MartinThoma To be honest, I know some kids that would have had the skills and patience to do this kind of thing in high school. I could see the sorts of kids who excel at such events knowing this. I think expecting a typical university graduate to know this is complete hogwash. – corsiKa Jun 16 '18 at 04:06
  • 2
    @sdcvvc Finite field arithmetic doesn't quite work with this algorithm. For example: With n=16, k=2, we can't distinguish (a₁=0, a₂=1) from (a₁=2, a₂=3). Values in GF(16) with primitive polynomial x⁴ + x + 1: For (a₁=0,a₂=1): b₁=a₁+a₂=0+1=1, b₂=a₁²+a₂²=0+1=1. For (a₁=2,a₂=3): b₁=a₁+a₂=2+3=1, b₂=a₁²+a₂²=4+5=1. (I used [these addition and multiplication tables](http://www.ee.unb.ca/cgi-bin/tervo/galois3.pl?p=4&C=1&D=1&A=1).) – jcsahnwaldt Reinstate Monica Dec 06 '18 at 12:31
  • @HeinrichApfelmus Maybe finite field arithmetic works if we don't use exponents that are powers of two? For example, for k=2, instead of the sum of squares we use the sum of cubes. I'm just guessing... (Of course, we'll have to adapt the algorithm that calculates a₁, a₂, .. from the sums.) – jcsahnwaldt Reinstate Monica Dec 06 '18 at 12:37
  • 1
    I should have said that finite field arithmetic over GF(2^x) with x > 0 doesn't work here. I think finite field arithmetic over GF(p) with p prime works. – jcsahnwaldt Reinstate Monica Dec 08 '18 at 13:26
  • 2
    @jcsahnwaldt You are right, thank you for catching this. In characteristic 2, x^2+y^2=(x+y)^2, therefore knowing x+y and x^2+y^2 is the same as knowing x+y only - that fixes only one degree of freedom instead of two. The reason is that Newton's identities require to divide by numbers up to k. So the algorithm requires using GF(p^x) where p > k and x > 0. – sdcvvc Dec 08 '18 at 15:01
  • Seems like a very complex solution, considering OP couldn't find a simple solution, I'd encourage him to ignore this one until he finds a simpler way. – TZubiri May 06 '20 at 22:52
  • @DavidEhrmann could you elaborate on how this equates to Reed-Solomon coding? – ambiso Dec 15 '20 at 12:20
  • How do you factor the polynomial efficiently? – lalala May 31 '21 at 17:52
  • Why use modulu only for varying k and not always? – Gulzar Jun 10 '21 at 08:42
  • See [a video explanation](https://mindyourdecisions.com/blog/2019/12/04/the-sum-of-5th-powers-solve-if-you-are-a-genius/) of this algorithm, solved to more detail – Gulzar Jun 12 '21 at 14:14
  • _"ring of polynomials is an Euclidean domain"_ While Euler is pronounced OY-ler, Euclid (and thus Euclidean) is pronounced YOO-klid. Just to be as difficult as possible :) – Clonkex Mar 09 '23 at 04:07
263

You will find it by reading the couple of pages of Muthukrishnan - Data Stream Algorithms: Puzzle 1: Finding Missing Numbers. It shows exactly the generalization you are looking for. Probably this is what your interviewer read and why he posed these questions.


Also see sdcvvc's directly related answer, which also includes pseudocode (hurray! no need to read those tricky math formulations :)) (thanks, great work!).

iacob
  • 20,084
  • 6
  • 92
  • 119
Dimitris Andreou
  • 8,806
  • 2
  • 33
  • 36
  • Oooh... That's interesting. I have to admit I got a bit confused by the maths but I was jsut skimming it. Might leave it open to look at more later. :) And +1 to get this link more findable. ;-) – Chris Aug 16 '10 at 12:21
  • 4
    The google books link doesn't work for me. Here a [better version](http://www.cs.rutgers.edu/~muthu/stream-1-1.ps) [PostScript File]. – Heinrich Apfelmus Aug 16 '10 at 12:31
  • 14
    Wow. I didn't expect this to get upvoted! Last time I posted a reference to the solution (Knuth's, in that case) instead of trying to solve it myself, it was actually downvoted: http://stackoverflow.com/questions/3060104/how-to-implement-three-stacks-using-a-single-array/3077753#3077753 The librarian inside me rejoices, thanks :) – Dimitris Andreou Aug 16 '10 at 12:33
  • @Apfelmus, note that this is a draft. (I don't blame you of course, I confused the draft for the real things for almost a year before finding the book). Btw if the link didn't work, you can go to http://books.google.com/ and search for "Muthukrishnan data stream algorithms" (without quotes), it's the first to pop up. – Dimitris Andreou Aug 16 '10 at 12:36
  • 2
    Please read the following as the answers provided here are ridiculous: http://stackoverflow.com/questions/4406110/missing-numbers-interview-question-redux –  Dec 26 '10 at 09:12
  • The book says that the best space bound is `k^2 logn` bits, with a randomized factoring algorithm? Isn't that the same as what you'd get by simply hashing `{1,...,n} -> {1,...,k^2}`? I added an answer with this solution. That should be the most obvious for computer scientists at least. – Thomas Ahle Apr 25 '16 at 21:56
188

We can solve Q2 by summing both the numbers themselves, and the squares of the numbers.

We can then reduce the problem to

k1 + k2 = x
k1^2 + k2^2 = y

Where x and y are how far the sums are below the expected values.

Substituting gives us:

(x-k2)^2 + k2^2 = y

Which we can then solve to determine our missing numbers.

Anon.
  • 58,739
  • 8
  • 81
  • 86
  • 7
    +1; I've tried the formula in Maple for select numbers and it works. I still couldn't convince myself WHY it works, though. – polygenelubricants Aug 16 '10 at 11:12
  • @polygenelubricants: Rearranging the first equation gives `k1 = x - k2`, and we can substitute that into the second equation. You can also generalize this to higher k - though it becomes computationally expensive quite rapidly. – Anon. Aug 16 '10 at 11:24
  • 1
    @Anon: let's restrict discussion to Q2 for now. Can you show that this will always result in a solution? (i.e. no ambiguity, no unsolvable, always unique and correct solution). – polygenelubricants Aug 16 '10 at 11:28
  • 4
    @polygenelubricants: If you wanted to prove correctness, you would first show that it always provides *a* correct solution (that is, it always produces a pair of numbers which, when removing them from the set, would result in the remainder of the set having the observed sum and sum-of-squares). From there, proving uniqueness is as simple as showing that it only produces one such pair of numbers. – Anon. Aug 16 '10 at 11:50
  • 6
    The nature of the equations means that you will get two values of k2 from that equation. However, from teh first equation that you use to generate k1 you can see that these two values of k2 will mean that k1 is the other value so you have two solutions that are the same numbers the opposite way around. If you abitrarily declared that k1>k2 then you'd only have one solution to the quadratic equation and thus one solution overall. And clearly by the nature of the question an answer always exists so it always works. – Chris Aug 16 '10 at 12:06
  • 3
    For a given sum k1+k2, there are many pairs. We can write these pairs as K1=a+b and K2 = a-b where a = (K1+k2/2). a is unique for a given sum. The sum of the squares (a+b)**2 + (a-b)**2 = 2*(a**2 + b**2). For a given sum K1+K2, the a**2 term is fixed and we see that the sum of the squares will be unique due to the b**2 term. Therefore, the values x and y are unique for a pair of integers. – phkahler Aug 16 '10 at 14:31
  • @Anon. Could you explain this further. I am trying to understanding this. Can you actually input numbers for me so I can visualize what I am not getting – user3281743 Oct 24 '14 at 20:26
  • 8
    This is awesome. @user3281743 here's an example. Let the missing numbers (k1 and k2) be 4 and 6. Sum(1 -> 10) = 55 and Sum(1^2 -> 10^2) = 385. Now let x = 55 - (Sum(All remaining numbers)) and y = 385 - (Sum(Squares of all remaining numbers)) thus x = 10 and y = 52. Substitute as shown which leaves us with: (10 - k2)^2 + k2^2 = 52 which you can simplify to: 2k^2 - 20k + 48 = 0. Solving the quadratic equation gives you 4 and 6 as the answer. – AlexKoren Oct 12 '15 at 02:07
153

I asked a 4-year-old to solve this problem. He sorted the numbers and then counted along. This has a space requirement of O(kitchen floor), and it works just as easy however many balls are missing.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Colonel Panic
  • 132,665
  • 89
  • 401
  • 465
  • 26
    ;) your 4 year old must be approaching 5 or/and is a genius. my 4 year old daughter cannot even count properly to 4 yet. well to be fair let's say she just barely finally integrated the "4"'s existence. otherwise until now she would always skip it. "1,2,3,5,6,7" was her usual counting sequence. I asked her to add pencils together and she would manage 1+2=3 by denumbering all again from scratch. I'm worried actually... :'( meh.. – v.oddou Apr 03 '14 at 08:07
  • 9
    O(kitchen floor) haha - but wouldn't that be O(n^2) ? –  Jul 09 '16 at 15:42
  • 22
    O(m²) i guess :) – Viktor Mellgren Jun 26 '17 at 12:19
  • sorting is definitely not O(n) – phuclv Apr 07 '19 at 03:49
  • 6
    @phuclv: the answer stated that "This has a **space** requirement of O(kitchen floor)". But in any case, this is an instance where sorting *can* be achieved in *O(n)* time --- see [this discussion](https://stackoverflow.com/questions/19533693/sorting-in-on-time). – Anthony Labarre Apr 23 '19 at 11:03
152

As @j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1) space, and an adaptation of my answer there works here too.

Assuming that the "bag" is represented by a 1-based array A[] of size N - k, we can solve Qk in O(N) time and O(k) additional space.

First, we extend our array A[] by k elements, so that it is now of size N. This is the O(k) additional space. We then run the following pseudo-code algorithm:

for i := n - k + 1 to n
    A[i] := A[1]
end for

for i := 1 to n - k
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end while
end for

for i := 1 to n
    if A[i] != i then 
        print i
    end if
end for

The first loop initialises the k extra entries to the same as the first entry in the array (this is just a convenient value that we know is already present in the array - after this step, any entries that were missing in the initial array of size N-k are still missing in the extended array).

The second loop permutes the extended array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that although it has a nested loop, it still runs in O(N) time - a swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

The third loop prints those indexes of the array i that are not occupied by the value i - this means that i must have been missing.

Community
  • 1
  • 1
caf
  • 233,326
  • 40
  • 323
  • 462
  • 8
    I wonder why so few people vote this answer up and even did not mark it as a correct answer. Here is the code in Python. It runs in O(n) time and need extra space O(k). http://pastebin.com/9jZqnTzV – wall-e Oct 22 '12 at 04:03
  • 1
    @wall-e: Well, I *was* several months late to this party. – caf Oct 22 '12 at 05:49
  • 4
    @caf this is quite similar to setting the bits and counting the places where the bit is 0. And I think as you are creating an integer array more memory is occupied. – Fox Apr 22 '13 at 06:41
  • 6
    "Setting the bits and counting the places where the bit is 0" requires O(n) extra space, this solution shows how to use O(k) extra space. – caf Dec 12 '13 at 23:19
  • 12
    Doesn't work with streams as input and modifies the input array (though I like it very much and the idea is fruitful). – comco Jan 30 '14 at 14:07
  • 1
    @comco He could have just as easily used separate storage rather than mutating the existing array. The concept/structure is the same. AFA streams: how would **any** of these algorithms operator on streams? – WestCoastProjects Mar 02 '14 at 17:03
  • Rather than declaring a "winner" between this and the accepted answer I'd say both are useful but being a non-math nerd find this one more approachable and immediately useful for day-to-day programming. – WestCoastProjects Mar 02 '14 at 17:05
  • You need to think this one again: `while A[A[i]] != A[i] swap(A[i], A[A[i]])` if `x != y` doing `swap x,y` will not make them equal. Your while loop just never ends. if it was to enter at least once, your algo never finishes. – v.oddou Apr 03 '14 at 08:02
  • 4
    @v.oddou: Nope, it's fine. The swap will change `A[i]`, which means that the next iteration won't be comparing the same two values as the previous one. The new `A[i]` will be the same as the last loop's `A[A[i]]`, but the new `A[A[i]]` will be a *new* value. Try it and see. – caf Apr 03 '14 at 10:55
  • 1
    @caf: oh right, I ran it (on paper) and indeed it does what you say. So nothing to see here, moving on. :) – v.oddou Apr 04 '14 at 01:09
  • A[] does not need to be extended if multiple passes are used. Let m = n-k = size of A[]. Position values x = 0 to m-1 so A[x] = x, then show missing values for x = 0 to m-1. Next position values x = m to min(2m, n)-1 so that A[x-m] = x, then show values for x = m to min(2m,n)-1. If 2m < n, then repeat again for x = 2m to min(3m,n)-1 (A[x-2m] == x). Perform y passes where y is smallest integer such that y*m >= n. Since y is a linear multiplier, the process is still O(n). – rcgldr Apr 12 '16 at 17:15
  • Beautiful answer to the problem as posed, and in fact far more practical in that situation (since the "expected" approach of summing powers of numbers requires arbitrary-precision integers and so takes too much space and time.) Though I suspect the spirit of the problem is looking at each of your inputs once only. Maybe a better formulation would be having numbers assigned to people and doing a census in a single pass or something? – sfink Nov 27 '18 at 01:37
  • 1
    This modifies the original array in more than just extending it. Can it be said to be O(k) space when the whole array gets actually operated on? Most of the times we are not allowed to mess with the input. – piepi May 01 '21 at 22:41
38

Not sure, if it's the most efficient solution, but I would loop over all entries, and use a bitset to remember, which numbers are set, and then test for 0 bits.

I like simple solutions - and I even believe, that it might be faster than calculating the sum, or the sum of squares etc.

Chris Lercher
  • 37,264
  • 20
  • 99
  • 131
  • 15
    I did propose this obvious answer, but this is not what the interviewer wanted. I explicitly said in the question that this is not the answer I'm looking for. Another obvious answer: sort first. Neither the `O(N)` counting sort nor `O(N log N)` comparison sort is what I'm looking for, although they are both very simple solutions. – polygenelubricants Aug 16 '10 at 11:14
  • @polygenelubricants: I can't find where you said that in your question. If you consider the bitset to be the result, then there is no second pass. The complexity is (if we consider N to be constant, as the interviewer suggests by saying, that the complexity is "defined in *k* not N") O(1), and if you need to construct a more "clean" result, you get O(k), which is the best you can get, because you always need O(k) to create the clean result. – Chris Lercher Aug 16 '10 at 11:20
  • "Note that I'm not looking for the obvious set-based solution (e.g. using a bit set,". The second last paragraph from the original question. – hrnt Aug 16 '10 at 11:24
  • 9
    @hmt: Yes, the question was edited a few minutes ago. I'm just giving the answer, that I would expect from an interviewee... Artificially constructing a sub-optimal solution (you can't beat O(n) + O(k) time, no matter what you do) doesn't make sense to me - except if you can't afford O(n) additional space, but the question isn't explicit on that. – Chris Lercher Aug 16 '10 at 11:30
  • 3
    I've edited the question again to further clarify. I do appreciate the feedback/answer. – polygenelubricants Aug 16 '10 at 11:42
  • Actually you can clearly do a bucket sort on such a well defined list like that which is O(n) not n log n. – Tatarize Dec 17 '15 at 14:04
  • @Tatarize, yes, but counting sorts [usually] require O(n) space. – Elliott Sep 17 '20 at 11:18
  • Still worthwhile O(n) time complexity even if you are making n lists of numbers you fill with like 1 number. – Tatarize Sep 19 '20 at 04:30
34

I haven't checked the maths, but I suspect that computing Σ(n^2) in the same pass as we compute Σ(n) would provide enough info to get two missing numbers, Do Σ(n^3) as well if there are three, and so on.

AakashM
  • 62,551
  • 17
  • 151
  • 186
19

The problem with solutions based on sums of numbers is they don't take into account the cost of storing and working with numbers with large exponents... in practice, for it to work for very large n, a big numbers library would be used. We can analyse the space utilisation for these algorithms.

We can analyse the time and space complexity of sdcvvc and Dimitris Andreou's algorithms.

Storage:

l_j = ceil (log_2 (sum_{i=1}^n i^j))
l_j > log_2 n^j  (assuming n >= 0, k >= 0)
l_j > j log_2 n \in \Omega(j log n)

l_j < log_2 ((sum_{i=1}^n i)^j) + 1
l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1
l_j < j log_2 n + j + c \in O(j log n)`

So l_j \in \Theta(j log n)

Total storage used: \sum_{j=1}^k l_j \in \Theta(k^2 log n)

Space used: assuming that computing a^j takes ceil(log_2 j) time, total time:

t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i)))
t > k log_2 (n^n + O(n^(n-1)))
t > k log_2 (n^n) = kn log_2 (n)  \in \Omega(kn log n)
t < k log_2 (\prod_i=1^n i^i) + 1
t < kn log_2 (n) + 1 \in O(kn log n)

Total time used: \Theta(kn log n)

If this time and space is satisfactory, you can use a simple recursive algorithm. Let b!i be the ith entry in the bag, n the number of numbers before removals, and k the number of removals. In Haskell syntax...

let
  -- O(1)
  isInRange low high v = (v >= low) && (v <= high)
  -- O(n - k)
  countInRange low high = sum $ map (fromEnum . isInRange low high . (!)b) [1..(n-k)]
  findMissing l low high krange
    -- O(1) if there is nothing to find.
    | krange=0 = l
    -- O(1) if there is only one possibility.
    | low=high = low:l
    -- Otherwise total of O(knlog(n)) time
    | otherwise =
       let
         mid = (low + high) `div` 2
         klow = countInRange low mid
         khigh = krange - klow
       in
         findMissing (findMissing low mid klow) (mid + 1) high khigh
in
  findMising 1 (n - k) k

Storage used: O(k) for list, O(log(n)) for stack: O(k + log(n)) This algorithm is more intuitive, has the same time complexity, and uses less space.

a1kmm
  • 1,354
  • 7
  • 13
16

A very simple solution to Q2 which I'm surprised nobody answered already. Use the method from Q1 to find the sum of the two missing numbers. Let's denote it by S, then one of the missing numbers is smaller than S/2 and the other is bigger than S/2 (duh). Sum all the numbers from 1 to S/2 and compare it to the formula's result (similarly to the method in Q1) to find the lower between the missing numbers. Subtract it from S to find the bigger missing number.

Gilad Deutsch
  • 365
  • 3
  • 9
  • 1
    I think this is the same as [Svalorzen's answer](https://stackoverflow.com/a/37419566/9772691), but you explained it in better words. Have any idea how to generalize it to Qk? – John McClane Apr 21 '20 at 14:59
  • 1
    Sorry for missing the other answer. I'm not sure if it's possible to generalize it to $Q_k$ since in that case you cannot bound the smallest missing element to some range. You do know that some element must be smaller than $S/k$ but that might be true for multiple elements – Gilad Deutsch Apr 22 '20 at 15:10
  • 1
    How about this for Q_k: After bisecting at the average, if you count the summands while taking the sum at side of the bisection, you will know the amount of numbers missing on each side - and the problem has been reduced to Q_l on the left side and Q_r on the right side, where l + r = k where l < k and r < k by the same reasoning as in the answer - so these can be solved recursively. – Tarje Bargheer Apr 28 '21 at 08:38
14

Wait a minute. As the question is stated, there are 100 numbers in the bag. No matter how big k is, the problem can be solved in constant time because you can use a set and remove numbers from the set in at most 100 - k iterations of a loop. 100 is constant. The set of remaining numbers is your answer.

If we generalise the solution to the numbers from 1 to N, nothing changes except N is not a constant, so we are in O(N - k) = O(N) time. For instance, if we use a bit set, we set the bits to 1 in O(N) time, iterate through the numbers, setting the bits to 0 as we go (O(N-k) = O(N)) and then we have the answer.

It seems to me that the interviewer was asking you how to print out the contents of the final set in O(k) time rather than O(N) time. Clearly, with a bit set, you have to iterate through all N bits to determine whether you should print the number or not. However, if you change the way the set is implemented you can print out the numbers in k iterations. This is done by putting the numbers into an object to be stored in both a hash set and a doubly linked list. When you remove an object from the hash set, you also remove it from the list. The answers will be left in the list which is now of length k.

JeremyP
  • 84,577
  • 15
  • 123
  • 161
  • 10
    This answer is too simple, and we all know that simple answers don't work! ;) Seriously though, original question should probably emphasize O(k) space requirement. – DK. Sep 02 '10 at 20:48
  • 1
    The problem is not that is simple but that you'll have to use O(n) additional memory for the map. The problem bust me solved in constant time and constant memory – Mojo Risin Mar 14 '11 at 14:58
  • 4
    I bet you can prove the minimal solution is at least O(N). because less, would mean that you didn't even LOOK at some numbers, and since there is no ordering specified, looking at ALL numbers is mandatory. – v.oddou Apr 03 '14 at 08:12
  • If we look at the input as a stream, and n is too large to keep in memory, the O(k) memory requirement makes sense. We can still use hashing though: Just make k^2 buckets and use the simple sum algorithm on each of them. That's only k^2 memory and a few more buckets can be used to get high probability of success. – Thomas Ahle Apr 26 '16 at 13:31
8

Here's a solution that uses k bits of extra storage, without any clever tricks and just straightforward. Execution time O (n), extra space O (k). Just to prove that this can be solved without reading up on the solution first or being a genius:

void puzzle (int* data, int n, bool* extra, int k)
{
    // data contains n distinct numbers from 1 to n + k, extra provides
    // space for k extra bits. 

    // Rearrange the array so there are (even) even numbers at the start
    // and (odd) odd numbers at the end.
    int even = 0, odd = 0;
    while (even + odd < n)
    {
        if (data [even] % 2 == 0) ++even;
        else if (data [n - 1 - odd] % 2 == 1) ++odd;
        else { int tmp = data [even]; data [even] = data [n - 1 - odd]; 
               data [n - 1 - odd] = tmp; ++even; ++odd; }
    }

    // Erase the lowest bits of all numbers and set the extra bits to 0.
    for (int i = even; i < n; ++i) data [i] -= 1;
    for (int i = 0; i < k; ++i) extra [i] = false;

    // Set a bit for every number that is present
    for (int i = 0; i < n; ++i)
    {
        int tmp = data [i];
        tmp -= (tmp % 2);
        if (i >= even) ++tmp;
        if (tmp <= n) data [tmp - 1] += 1; else extra [tmp - n - 1] = true;
    }

    // Print out the missing ones
    for (int i = 1; i <= n; ++i)
        if (data [i - 1] % 2 == 0) printf ("Number %d is missing\n", i);
    for (int i = n + 1; i <= n + k; ++i)
        if (! extra [i - n - 1]) printf ("Number %d is missing\n", i);

    // Restore the lowest bits again.
    for (int i = 0; i < n; ++i) {
        if (i < even) { if (data [i] % 2 != 0) data [i] -= 1; }
        else { if (data [i] % 2 == 0) data [i] += 1; }
    }
}
gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • Did you want `(data [n - 1 - odd] % 2 == 1) ++odd;`? – Charles Apr 12 '14 at 13:36
  • 2
    Could you explain how this works? I don't understand. – Teepeemm Sep 26 '14 at 14:03
  • The solution would be very, very, simple if I could use an array of (n + k) booleans for temporary storage, but that is not allowed. So I rearrange the data, putting the even numbers at the beginning, and the odd numbers at the end of the array. Now the lowest bits of those n numbers can be used for temporary storage, because I know how many even and odd numbers there are and can reconstruct the lowest bits! These n bits and the k extra bits are exactly the (n + k) booleans that I needed. – gnasher729 Oct 15 '14 at 16:40
  • 3
    This wouldn't work if the data were too large to keep in memory, and you only saw it as a stream. Deliciously hacky though :) – Thomas Ahle Apr 26 '16 at 13:59
  • The space complexity can be O(1). In a first pass, you process all numbers < (n - k) by exactly this algorithm, without using 'extra'. In a second pass, you clear out the parity bits again and use the first k positions for indexing numbers (n-k)..(n). – emu May 04 '16 at 08:12
  • The line `if (i >= odd) ++tmp;` should be `if (i >= even) ++tmp;`. – jcsahnwaldt Reinstate Monica Nov 24 '18 at 12:51
  • The line `for (int i = even; i < n; ++i) data [i] += 1;` doesn't correctly restore bits. – jcsahnwaldt Reinstate Monica Nov 24 '18 at 13:07
  • I fixed these two bugs. – jcsahnwaldt Reinstate Monica Nov 24 '18 at 13:17
  • I'm not sure how this solution is any simpler or more efficient than caf's solution, which was posted a few years before this one. – Elliott Sep 18 '20 at 03:18
8

To solve the 2 (and 3) missing numbers question, you can modify quickselect, which on average runs in O(n) and uses constant memory if partitioning is done in-place.

  1. Partition the set with respect to a random pivot p into partitions l, which contain numbers smaller than the pivot, and r, which contain numbers greater than the pivot.

  2. Determine which partitions the 2 missing numbers are in by comparing the pivot value to the size of each partition (p - 1 - count(l) = count of missing numbers in l and n - count(r) - p = count of missing numbers in r)

  3. a) If each partition is missing one number, then use the difference of sums approach to find each missing number.

    (1 + 2 + ... + (p-1)) - sum(l) = missing #1 and ((p+1) + (p+2) ... + n) - sum(r) = missing #2

    b) If one partition is missing both numbers and the partition is empty, then the missing numbers are either (p-1,p-2) or (p+1,p+2) depending on which partition is missing the numbers.

    If one partition is missing 2 numbers but is not empty, then recurse onto that partiton.

With only 2 missing numbers, this algorithm always discards at least one partition, so it retains O(n) average time complexity of quickselect. Similarly, with 3 missing numbers this algorithm also discards at least one partition with each pass (because as with 2 missing numbers, at most only 1 partition will contain multiple missing numbers). However, I'm not sure how much the performance decreases when more missing numbers are added.

Here's an implementation that does not use in-place partitioning, so this example does not meet the space requirement but it does illustrate the steps of the algorithm:

<?php

  $list = range(1,100);
  unset($list[3]);
  unset($list[31]);

  findMissing($list,1,100);

  function findMissing($list, $min, $max) {
    if(empty($list)) {
      print_r(range($min, $max));
      return;
    }

    $l = $r = [];
    $pivot = array_pop($list);

    foreach($list as $number) {
      if($number < $pivot) {
        $l[] = $number;
      }
      else {
        $r[] = $number;
      }
    }

    if(count($l) == $pivot - $min - 1) {
      // only 1 missing number use difference of sums
      print array_sum(range($min, $pivot-1)) - array_sum($l) . "\n";
    }
    else if(count($l) < $pivot - $min) {
      // more than 1 missing number, recurse
      findMissing($l, $min, $pivot-1);
    }

    if(count($r) == $max - $pivot - 1) {
      // only 1 missing number use difference of sums
      print array_sum(range($pivot + 1, $max)) - array_sum($r) . "\n";
    } else if(count($r) < $max - $pivot) {
      // mroe than 1 missing number recurse
      findMissing($r, $pivot+1, $max);
    }
  }

Demo

FuzzyTree
  • 32,014
  • 3
  • 54
  • 85
  • Partitioning the set is like using linear space. At least it wouldn't work in a streaming setting. – Thomas Ahle Apr 26 '16 at 13:44
  • 1
    @ThomasAhle see https://en.wikipedia.org/wiki/Selection_algorithm#Space_complexity. partioning the set in place only requires O(1) additional space - not linear space. In a streaming setting it would be O(k) additional space, however, the original question does not mention streaming. – FuzzyTree Apr 26 '16 at 14:37
  • Not directly, but he does write "you must scan the input in O(N), but you can only capture small amount of information (defined in terms of k not N)" which is usually the definition of streaming. Moving all the numbers for partitioning isn't really possible unless you have an array of size N. It's just that the question has a lot of answers witch seem to ignore this constraint. – Thomas Ahle Apr 26 '16 at 15:10
  • 1
    @ThomasAhle I think it's a big leap to conclude that "you must scan the input in O(N), but you can only capture small amount of information (defined in terms of k not N)" means op implied the data is being streamed. I interpret it simply as "Your answer must run in O(N) and only use O(k) extra storage" and I would certainly disagree that it's the _definition_ of streaming. – FuzzyTree Apr 26 '16 at 15:33
  • I'm not sure, but I do agree your algorithm is very elegant. With regards to the running time, I think its probably just `nlogk`? – Thomas Ahle Apr 26 '16 at 16:12
  • @ThomasAhle the running time is identical as quick select, which is O(n^2) in the worst case but O(n) on average – FuzzyTree Apr 26 '16 at 18:36
  • 1
    But as you say, the performance may decrease as more numbers are added? We can also use the linear time median algorithm, to always get a perfect cut, but if the k numbers are well spread out in 1,...,n, wont you have to go about logk levels "deep" before you can prune any branches? – Thomas Ahle Apr 26 '16 at 22:01
  • @ThomasAhle I should have said the complexity is identical to quickselect for k < 4. `nlogk` sounds like a reasonable estimate for larger k – FuzzyTree Apr 26 '16 at 22:24
  • @FuzzyTree Why should the worst case performance be n^2? The input numbers are a contiguous range. You can pick an optimal pivot in the middle, thus assuring O(n) for a constant number of missing items. – emu May 03 '16 at 17:34
  • @emu I'm not sure if it's possible to pick an optimal pivot _and_ do in-place partitioning since you don't know where the optimal pivot is. – FuzzyTree May 04 '16 at 00:22
  • @FuzzyTree Of course. The pivot need not be even present and these algorithms will still work the same. – emu May 04 '16 at 07:53
  • 3
    The worst-case running time is indeed nlogk because you need to process the whole input at most logk times, and then it's a geometric sequence (one that starts with at most n elements). The space requirements are logn when implemented with plain recursion, but they can be made O(1) by running an actual quickselect and ensuring the correct length of each partition. – emu May 04 '16 at 08:04
8

For Q2 this is a solution that is a bit more inefficient than the others, but still has O(N) runtime and takes O(k) space.

The idea is to run the original algorithm two times. In the first one you get a total number which is missing, which gives you an upper bound of the missing numbers. Let's call this number N. You know that the missing two numbers are going to sum up to N, so the first number can only be in the interval [1, floor((N-1)/2)] while the second is going to be in [floor(N/2)+1,N-1].

Thus you loop on all numbers once again, discarding all numbers that are not included in the first interval. The ones that are, you keep track of their sum. Finally, you'll know one of the missing two numbers, and by extension the second.

I have a feeling that this method could be generalized and maybe multiple searches run in "parallel" during a single pass over the input, but I haven't yet figured out how.

Svalorzen
  • 5,353
  • 3
  • 30
  • 54
  • 1
    Ahaha yes this is the same solution I came up with for Q2, just with calculating the sum again taking the _negatives_ for all numbers below N/2, but this is even better! – xjcl Apr 08 '19 at 14:51
  • 1
    I know' it's four years later, but I figured out how to generalise this (and yes, you could use parallelism to speed it up): https://stackoverflow.com/a/64285525/8658157 – Elliott Oct 09 '20 at 18:44
8

Motivation

If you want to solve the general-case problem, and you can store and edit the array, then Caf's solution is by far the most efficient. If you can't store the array (streaming version), then sdcvvc's answer is the only type of solution currently suggested.

The solution I propose is the most efficient answer (so far on this thread) if you can store the array but can't edit it, and I got the idea from Svalorzen's solution, which solves for 1 or 2 missing items. This solution takes Θ(k*n) time and O(min(k,log(n))) and Ω(log(k)) space. It also works well with parallelism.

Concept

The idea is that if you use the original approach of comparing sums:
sum = SumOf(1,n) - SumOf(array)

... then you take the average of the missing numbers:
average = sum/n_missing_numbers

... which provides a boundary: Of the missing numbers, there's guaranteed to be at least one number less-or-equal to average, and at least one number greater than average. This means that we can split into sub problems that each scan the array [O(n)] and are only concerned with their respective sub-arrays.

Code

C-style solution (don't judge me for the global variables, I'm just trying to make the code readable for non-c folks):

#include "stdio.h"

// Example problem:
const int array [] = {0, 7, 3, 1, 5};
const int N = 8; // size of original array
const int array_size = 5;

int SumOneTo (int n)
{
    return n*(n-1)/2; // non-inclusive
}

int MissingItems (const int begin, const int end, int & average)
{
    // We consider only sub-array elements with values, v:
    // begin <= v < end
    
    // Initialise info about missing elements.
    // First assume all are missing:
    int n = end - begin;
    int sum = SumOneTo(end) - SumOneTo(begin);

    // Minus everything that we see (ie not missing):
    for (int i = 0; i < array_size; ++i)
    {
        if ((begin <= array[i]) && (array[i] < end))
        {
            --n;
            sum -= array[i];
        }
    }
    
    // used by caller:
    average = sum/n;
    return n;
}

void Find (const int begin, const int end)
{
    int average;

    if (MissingItems(begin, end, average) == 1)
    {
        printf(" %d", average); // average(n) is same as n
        return;
    }
    
    Find(begin, average + 1); // at least one missing here
    Find(average + 1, end); // at least one here also
}

int main ()
{   
    printf("Missing items:");
    
    Find(0, N);
    
    printf("\n");
}

Analysis

Ignoring recursion for a moment, each function call clearly takes O(n) time and O(1) space. Note that sum can equal as much as n(n-1)/2, so requires double the amount of bits needed to store n-1. At most this means than we effectively need two extra elements worth of space, regardless of the size of the array or k, hence it's still O(1) space under the normal conventions.

It's not so obvious how many function calls there are for k missing elements, so I'll provide a visual. Your original sub-array (connected array) is the full array, which has all k missing elements in it. We'll imagine them in increasing order, where -- represent connections (part of same sub-array):

m1 -- m2 -- m3 -- m4 -- (...) -- mk-1 -- mk

The effect of the Find function is to disconnect the missing elements into different non-overlapping sub-arrays. It guarantees that there's at least one missing element in each sub-array, which means breaking exactly one connection.

What this means is that regardless of how the splits occur, it will always take k-1 Find function calls to do the work of finding the sub-arrays that have only one missing element in it.

So the time complexity is Θ((k-1 + k) * n) = Θ(k*n).

For the space complexity, if we divide proportionally each time then we get O(log(k)) space complexity, but if we only separate one at a time it gives us O(k).

See here for a proof as to why the space complexity is O(log(n)). Given that above we've shown that it's also O(k), then we know that it's O(min(k,log(n))).

Elliott
  • 2,603
  • 2
  • 18
  • 35
6

May be this algorithm can work for question 1:

  1. Precompute xor of first 100 integers(val=1^2^3^4....100)
  2. xor the elements as they keep coming from input stream ( val1=val1^next_input)
  3. final answer=val^val1

Or even better:

def GetValue(A)
  val=0
  for i=1 to 100
    do
      val=val^i
    done
  for value in A:
    do
      val=val^value 
    done
  return val

This algorithm can in fact be expanded for two missing numbers. The first step remains the same. When we call GetValue with two missing numbers the result will be a a1^a2 are the two missing numbers. Lets say

val = a1^a2

Now to sieve out a1 and a2 from val we take any set bit in val. Lets say the ith bit is set in val. That means that a1 and a2 have different parity at ith bit position. Now we do another iteration on the original array and keep two xor values. One for the numbers which have the ith bit set and other which doesn't have the ith bit set. We now have two buckets of numbers, and its guranteed that a1 and a2 will lie in different buckets. Now repeat the same what we did for finding one missing element on each of the bucket.

bashrc
  • 4,725
  • 1
  • 22
  • 49
5

There is a general way to solve streaming problems like this. The idea is to use a bit of randomization to hopefully 'spread' the k elements into independent sub problems, where our original algorithm solves the problem for us. This technique is used in sparse signal reconstruction, among other things.

  • Make an array, a, of size u = k^2.
  • Pick any universal hash function, h : {1,...,n} -> {1,...,u}. (Like multiply-shift)
  • For each i in 1, ..., n increase a[h(i)] += i
  • For each number x in the input stream, decrement a[h(x)] -= x.

If all of the missing numbers have been hashed to different buckets, the non-zero elements of the array will now contain the missing numbers.

The probability that a particular pair is sent to the same bucket, is less than 1/u by definition of a universal hash function. Since there are about k^2/2 pairs, we have that the error probability is at most k^2/2/u=1/2. That is, we succeed with probability at least 50%, and if we increase u we increase our chances.

Notice that this algorithm takes k^2 logn bits of space (We need logn bits per array bucket.) This matches the space required by @Dimitris Andreou's answer (In particular the space requirement of polynomial factorization, which happens to also be randomized.) This algorithm also has constant time per update, rather than time k in the case of power-sums.

In fact, we can be even more efficient than the power sum method by using the trick described in the comments.

Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
  • Note: We can also use `xor` in each bucket, rather than `sum`, if that's faster on our machine. – Thomas Ahle Apr 26 '16 at 13:53
  • Interesting but I think this only respects the space constraint when `k <= sqrt(n)` - at least if `u=k^2`? Suppose k=11 and n=100, then you would have 121 buckets and the algorithm would end up being similar to having an array of 100 bits that you check off as you read each # from the stream. Increasing `u` improves the chances of success but there's a limit to how much you can increase it before you exceed the space constraint. – FuzzyTree Apr 27 '16 at 02:18
  • 1
    The problem makes most sense for `n` much larger than `k`, I think, but you can actually get space down to `k logn` with a method very similar to the hashing described, while still having constant time updates. It's described in https://gnunet.org/eppstein-set-reconciliation , like the sum of powers method, but basically you hash to 'two of k' buckets with a strong hash function like tabulation hashing, which guarantees that some bucket will have only one element. To decode, you identify that bucket and removes the element from both of its buckets, which (likely) frees another bucket and so on – Thomas Ahle May 11 '16 at 10:54
  • A variant which is slower but uses less space: If you use 2k buckets, you can find some of the missing numbers, and you can repeat (with a different univ hash each time) to find the rest (you can consider already-found numbers as if they were in the array for additional speedup). If I've done my sums right, this takes O(log(k)) iterations on average, for O(n log k) expected time, and O(k log n) space. There's a bit of extra bookkeeping because you need to store the already-found numbers. – Paul Hankin Aug 24 '22 at 15:47
4

Can you check if every number exists? If yes you may try this:

S = sum of all numbers in the bag (S < 5050)
Z = sum of the missing numbers 5050 - S

if the missing numbers are x and y then:

x = Z - y and
max(x) = Z - 1

So you check the range from 1 to max(x) and find the number

Nakilon
  • 34,866
  • 14
  • 107
  • 142
Ilian Iliev
  • 3,217
  • 4
  • 26
  • 51
3

You can solve Q2 if you have the sum of both lists and the product of both lists.

(l1 is the original, l2 is the modified list)

d = sum(l1) - sum(l2)
m = mul(l1) / mul(l2)

We can optimise this since the sum of an arithmetic series is n times the average of the first and last terms:

n = len(l1)
d = (n/2)*(n+1) - sum(l2)

Now we know that (if a and b are the removed numbers):

a + b = d
a * b = m

So we can rearrange to:

a = s - b
b * (s - b) = m

And multiply out:

-b^2 + s*b = m

And rearrange so the right side is zero:

-b^2 + s*b - m = 0

Then we can solve with the quadratic formula:

b = (-s + sqrt(s^2 - (4*-1*-m)))/-2
a = s - b

Sample Python 3 code:

from functools import reduce
import operator
import math
x = list(range(1,21))
sx = (len(x)/2)*(len(x)+1)
x.remove(15)
x.remove(5)
mul = lambda l: reduce(operator.mul,l)
s = sx - sum(x)
m = mul(range(1,21)) / mul(x)
b = (-s + math.sqrt(s**2 - (-4*(-m))))/-2
a = s - b
print(a,b) #15,5

I do not know the complexity of the sqrt, reduce and sum functions so I cannot work out the complexity of this solution (if anyone does know please comment below.)

Tuomas Laakkonen
  • 1,280
  • 10
  • 15
  • How much time and memory does it use to calculate `x1*x2*x3*...`? – Thomas Ahle Apr 26 '16 at 13:57
  • @ThomasAhle It is O(n)-time and O(1)-space on the length of the list, but in reality it's more as multiplication (at least in Python) is O(n^1.6)-time on the length of the number and numbers are O(log n)-space on their length. – Tuomas Laakkonen Apr 29 '16 at 16:21
  • @ThomasAhle No, log(a^n) = n*log(a) so you would have O(l log k)-space to store the number. So given a list of length l and original numbers of length k, you would have O(l)-space but the constant factor (log k) would be lower than just writing them all out. (I don't think my method is a particularly good way of answering the question.) – Tuomas Laakkonen Apr 29 '16 at 17:16
3

Here is a solution that doesn't rely on complex math as sdcvvc's/Dimitris Andreou's answers do, doesn't change the input array as caf and Colonel Panic did, and doesn't use the bitset of enormous size as Chris Lercher, JeremyP and many others did. Basically, I began with Svalorzen's/Gilad Deutch's idea for Q2, generalized it to the common case Qk and implemented in Java to prove that the algorithm works.

The idea

Suppose we have an arbitrary interval I of which we only know that it contains at least one of the missing numbers. After one pass through the input array, looking only at the numbers from I, we can obtain both the sum S and the quantity Q of missing numbers from I. We do this by simply decrementing I's length each time we encounter a number from I (for obtaining Q) and by decreasing pre-calculated sum of all numbers in I by that encountered number each time (for obtaining S).

Now we look at S and Q. If Q = 1, it means that then I contains only one of the missing numbers, and this number is clearly S. We mark I as finished (it is called "unambiguous" in the program) and leave it out from further consideration. On the other hand, if Q > 1, we can calculate the average A = S / Q of missing numbers contained in I. As all numbers are distinct, at least one of such numbers is strictly less than A and at least one is strictly greater than A. Now we split I in A into two smaller intervals each of which contains at least one missing number. Note that it doesn't matter to which of the intervals we assign A in case it is an integer.

We make the next array pass calculating S and Q for each of the intervals separately (but in the same pass) and after that mark intervals with Q = 1 and split intervals with Q > 1. We continue this process until there are no new "ambiguous" intervals, i.e. we have nothing to split because each interval contains exactly one missing number (and we always know this number because we know S). We start out from the sole "whole range" interval containing all possible numbers (like [1..N] in the question).

Time and space complexity analysis

The total number of passes p we need to make until the process stops is never greater than the missing numbers count k. The inequality p <= k can be proved rigorously. On the other hand, there is also an empirical upper bound p < log2N + 3 that is useful for large values of k. We need to make a binary search for each number of the input array to determine the interval to which it belongs. This adds the log k multiplier to the time complexity.

In total, the time complexity is O(N ᛫ min(k, log N) ᛫ log k). Note that for large k, this is significantly better than that of sdcvvc/Dimitris Andreou's method, which is O(N ᛫ k).

For its work, the algorithm requires O(k) additional space for storing at most k intervals, that is significantly better than O(N) in "bitset" solutions.

Java implementation

Here's a Java class that implements the above algorithm. It always returns a sorted array of missing numbers. Besides that, it doesn't require the missing numbers count k because it calculates it in the first pass. The whole range of numbers is given by the minNumber and maxNumber parameters (e.g. 1 and 100 for the first example in the question).

public class MissingNumbers {
    private static class Interval {
        boolean ambiguous = true;
        final int begin;
        int quantity;
        long sum;

        Interval(int begin, int end) { // begin inclusive, end exclusive
            this.begin = begin;
            quantity = end - begin;
            sum = quantity * ((long)end - 1 + begin) / 2;
        }

        void exclude(int x) {
            quantity--;
            sum -= x;
        }
    }

    public static int[] find(int minNumber, int maxNumber, NumberBag inputBag) {
        Interval full = new Interval(minNumber, ++maxNumber);
        for (inputBag.startOver(); inputBag.hasNext();)
            full.exclude(inputBag.next());
        int missingCount = full.quantity;
        if (missingCount == 0)
            return new int[0];
        Interval[] intervals = new Interval[missingCount];
        intervals[0] = full;
        int[] dividers = new int[missingCount];
        dividers[0] = minNumber;
        int intervalCount = 1;
        while (true) {
            int oldCount = intervalCount;
            for (int i = 0; i < oldCount; i++) {
                Interval itv = intervals[i];
                if (itv.ambiguous)
                    if (itv.quantity == 1) // number inside itv uniquely identified
                        itv.ambiguous = false;
                    else
                        intervalCount++; // itv will be split into two intervals
            }
            if (oldCount == intervalCount)
                break;
            int newIndex = intervalCount - 1;
            int end = maxNumber;
            for (int oldIndex = oldCount - 1; oldIndex >= 0; oldIndex--) {
                // newIndex always >= oldIndex
                Interval itv = intervals[oldIndex];
                int begin = itv.begin;
                if (itv.ambiguous) {
                    // split interval itv
                    // use floorDiv instead of / because input numbers can be negative
                    int mean = (int)Math.floorDiv(itv.sum, itv.quantity) + 1;
                    intervals[newIndex--] = new Interval(mean, end);
                    intervals[newIndex--] = new Interval(begin, mean);
                } else
                    intervals[newIndex--] = itv;
                end = begin;
            }
            for (int i = 0; i < intervalCount; i++)
                dividers[i] = intervals[i].begin;
            for (inputBag.startOver(); inputBag.hasNext();) {
                int x = inputBag.next();
                // find the interval to which x belongs
                int i = java.util.Arrays.binarySearch(dividers, 0, intervalCount, x);
                if (i < 0)
                    i = -i - 2;
                Interval itv = intervals[i];
                if (itv.ambiguous)
                    itv.exclude(x);
            }
        }
        assert intervalCount == missingCount;
        for (int i = 0; i < intervalCount; i++)
            dividers[i] = (int)intervals[i].sum;
        return dividers;
    }
}

For fairness, this class receives input in form of NumberBag objects. NumberBag doesn't allow array modification and random access and also counts how many times the array was requested for sequential traversing. It is also more suitable for large array testing than Iterable<Integer> because it avoids boxing of primitive int values and allows wrapping a part of a large int[] for a convenient test preparation. It is not hard to replace, if desired, NumberBag by int[] or Iterable<Integer> type in the find signature, by changing two for-loops in it into foreach ones.

import java.util.*;

public abstract class NumberBag {
    private int passCount;

    public void startOver() {
        passCount++;
    }

    public final int getPassCount() {
        return passCount;
    }

    public abstract boolean hasNext();

    public abstract int next();

    // A lightweight version of Iterable<Integer> to avoid boxing of int
    public static NumberBag fromArray(int[] base, int fromIndex, int toIndex) {
        return new NumberBag() {
            int index = toIndex;

            public void startOver() {
                super.startOver();
                index = fromIndex;
            }

            public boolean hasNext() {
                return index < toIndex;
            }

            public int next() {
                if (index >= toIndex)
                    throw new NoSuchElementException();
                return base[index++];
            }
        };
    }

    public static NumberBag fromArray(int[] base) {
        return fromArray(base, 0, base.length);
    }

    public static NumberBag fromIterable(Iterable<Integer> base) {
        return new NumberBag() {
            Iterator<Integer> it;

            public void startOver() {
                super.startOver();
                it = base.iterator();
            }

            public boolean hasNext() {
                return it.hasNext();
            }

            public int next() {
                return it.next();
            }
        };
    }
}

Tests

Simple examples demonstrating the usage of these classes are given below.

import java.util.*;

public class SimpleTest {
    public static void main(String[] args) {
        int[] input = { 7, 1, 4, 9, 6, 2 };
        NumberBag bag = NumberBag.fromArray(input);
        int[] output = MissingNumbers.find(1, 10, bag);
        System.out.format("Input: %s%nMissing numbers: %s%nPass count: %d%n",
                Arrays.toString(input), Arrays.toString(output), bag.getPassCount());

        List<Integer> inputList = new ArrayList<>();
        for (int i = 0; i < 10; i++)
            inputList.add(2 * i);
        Collections.shuffle(inputList);
        bag = NumberBag.fromIterable(inputList);
        output = MissingNumbers.find(0, 19, bag);
        System.out.format("%nInput: %s%nMissing numbers: %s%nPass count: %d%n",
                inputList, Arrays.toString(output), bag.getPassCount());

        // Sieve of Eratosthenes
        final int MAXN = 1_000;
        List<Integer> nonPrimes = new ArrayList<>();
        nonPrimes.add(1);
        int[] primes;
        int lastPrimeIndex = 0;
        while (true) {
            primes = MissingNumbers.find(1, MAXN, NumberBag.fromIterable(nonPrimes));
            int p = primes[lastPrimeIndex]; // guaranteed to be prime
            int q = p;
            for (int i = lastPrimeIndex++; i < primes.length; i++) {
                q = primes[i]; // not necessarily prime
                int pq = p * q;
                if (pq > MAXN)
                    break;
                nonPrimes.add(pq);
            }
            if (q == p)
                break;
        }
        System.out.format("%nSieve of Eratosthenes. %d primes up to %d found:%n",
                primes.length, MAXN);
        for (int i = 0; i < primes.length; i++)
            System.out.format(" %4d%s", primes[i], (i % 10) < 9 ? "" : "\n");
    }
}

Large array testing can be performed this way:

import java.util.*;

public class BatchTest {
    private static final Random rand = new Random();
    public static int MIN_NUMBER = 1;
    private final int minNumber = MIN_NUMBER;
    private final int numberCount;
    private final int[] numbers;
    private int missingCount;
    public long finderTime;

    public BatchTest(int numberCount) {
        this.numberCount = numberCount;
        numbers = new int[numberCount];
        for (int i = 0; i < numberCount; i++)
            numbers[i] = minNumber + i;
    }

    private int passBound() {
        int mBound = missingCount > 0 ? missingCount : 1;
        int nBound = 34 - Integer.numberOfLeadingZeros(numberCount - 1); // ceil(log_2(numberCount)) + 2
        return Math.min(mBound, nBound);
    }

    private void error(String cause) {
        throw new RuntimeException("Error on '" + missingCount + " from " + numberCount + "' test, " + cause);
    }

    // returns the number of times the input array was traversed in this test
    public int makeTest(int missingCount) {
        this.missingCount = missingCount;
        // numbers array is reused when numberCount stays the same,
        // just Fisher–Yates shuffle it for each test
        for (int i = numberCount - 1; i > 0; i--) {
            int j = rand.nextInt(i + 1);
            if (i != j) {
                int t = numbers[i];
                numbers[i] = numbers[j];
                numbers[j] = t;
            }
        }
        final int bagSize = numberCount - missingCount;
        NumberBag inputBag = NumberBag.fromArray(numbers, 0, bagSize);
        finderTime -= System.nanoTime();
        int[] found = MissingNumbers.find(minNumber, minNumber + numberCount - 1, inputBag);
        finderTime += System.nanoTime();
        if (inputBag.getPassCount() > passBound())
            error("too many passes (" + inputBag.getPassCount() + " while only " + passBound() + " allowed)");
        if (found.length != missingCount)
            error("wrong result length");
        int j = bagSize; // "missing" part beginning in numbers
        Arrays.sort(numbers, bagSize, numberCount);
        for (int i = 0; i < missingCount; i++)
            if (found[i] != numbers[j++])
                error("wrong result array, " + i + "-th element differs");
        return inputBag.getPassCount();
    }

    public static void strideCheck(int numberCount, int minMissing, int maxMissing, int step, int repeats) {
        BatchTest t = new BatchTest(numberCount);
        System.out.println("╠═══════════════════════╬═════════════════╬═════════════════╣");
        for (int missingCount = minMissing; missingCount <= maxMissing; missingCount += step) {
            int minPass = Integer.MAX_VALUE;
            int passSum = 0;
            int maxPass = 0;
            t.finderTime = 0;
            for (int j = 1; j <= repeats; j++) {
                int pCount = t.makeTest(missingCount);
                if (pCount < minPass)
                    minPass = pCount;
                passSum += pCount;
                if (pCount > maxPass)
                    maxPass = pCount;
            }
            System.out.format("║ %9d  %9d  ║  %2d  %5.2f  %2d  ║  %11.3f    ║%n", missingCount, numberCount, minPass,
                    (double)passSum / repeats, maxPass, t.finderTime * 1e-6 / repeats);
        }
    }

    public static void main(String[] args) {
        System.out.println("╔═══════════════════════╦═════════════════╦═════════════════╗");
        System.out.println("║      Number count     ║      Passes     ║  Average time   ║");
        System.out.println("║   missimg     total   ║  min  avg   max ║ per search (ms) ║");
        long time = System.nanoTime();
        strideCheck(100, 0, 100, 1, 20_000);
        strideCheck(100_000, 2, 99_998, 1_282, 15);
        MIN_NUMBER = -2_000_000_000;
        strideCheck(300_000_000, 1, 10, 1, 1);
        time = System.nanoTime() - time;
        System.out.println("╚═══════════════════════╩═════════════════╩═════════════════╝");
        System.out.format("%nSuccess. Total time: %.2f s.%n", time * 1e-9);
    }
}

Try them out on Ideone

John McClane
  • 3,498
  • 3
  • 12
  • 33
2

I think this can be done without any complex mathematical equations and theories. Below is a proposal for an in place and O(2n) time complexity solution:

Input form assumptions :

# of numbers in bag = n

# of missing numbers = k

The numbers in the bag are represented by an array of length n

Length of input array for the algo = n

Missing entries in the array (numbers taken out of the bag) are replaced by the value of the first element in the array.

Eg. Initially bag looks like [2,9,3,7,8,6,4,5,1,10]. If 4 is taken out, value of 4 will become 2 (the first element of the array). Therefore after taking 4 out the bag will look like [2,9,3,7,8,6,2,5,1,10]

The key to this solution is to tag the INDEX of a visited number by negating the value at that INDEX as the array is traversed.

    IEnumerable<int> GetMissingNumbers(int[] arrayOfNumbers)
    {
        List<int> missingNumbers = new List<int>();
        int arrayLength = arrayOfNumbers.Length;

        //First Pass
        for (int i = 0; i < arrayLength; i++)
        {
            int index = Math.Abs(arrayOfNumbers[i]) - 1;
            if (index > -1)
            {
                arrayOfNumbers[index] = Math.Abs(arrayOfNumbers[index]) * -1; //Marking the visited indexes
            }
        }

        //Second Pass to get missing numbers
        for (int i = 0; i < arrayLength; i++)
        {                
            //If this index is unvisited, means this is a missing number
            if (arrayOfNumbers[i] > 0)
            {
                missingNumbers.Add(i + 1);
            }
        }

        return missingNumbers;
    }
pickhunter
  • 346
  • 6
  • 11
2

Thanks for this very interesting question:

It's because you reminded me Newton's work which really can solve this problem

Please refer Newton's Identities

As number of variables to find = number of equations (must for consistency)

I believe for this we should raise power to bag numbers so as to create number of different equations.

I don't know but, I believe if there should a function say f for which we'll add f( xi )

x1 + x2 + ... + xk = z1

x12 + x22 + ... + xk2 = z2

............

............

............

x1k + x2k + ... + xkk = zk

rest is a mathematical work not sure about time and space complexity but Newton's Identities will surely play important role.

  • Can't we use set theory .difference_update() or Is there any chance of Linear Algebra in this question method?
1

You'd probably need clarification on what O(k) means.

Here's a trivial solution for arbitrary k: for each v in your set of numbers, accumulate the sum of 2^v. At the end, loop i from 1 to N. If sum bitwise ANDed with 2^i is zero, then i is missing. (Or numerically, if floor of the sum divided by 2^i is even. Or sum modulo 2^(i+1)) < 2^i.)

Easy, right? O(N) time, O(1) storage, and it supports arbitrary k.

Except that you're computing enormous numbers that on a real computer would each require O(N) space. In fact, this solution is identical to a bit vector.

So you could be clever and compute the sum and the sum of squares and the sum of cubes... up to the sum of v^k, and do the fancy math to extract the result. But those are big numbers too, which begs the question: what abstract model of operation are we talking about? How much fits in O(1) space, and how long does it take to sum up numbers of whatever size you need?

sfink
  • 1,726
  • 1
  • 17
  • 22
  • Nice answer! One little thing: "If sum modulo 2^i is zero, then i is missing" is incorrect. But it's clear what is intended. I think "if sum modulo 2^(i+1) is less than 2^i, then i is missing" would be correct. (Of course, in most programming languages we would use bit shifting instead of modulo calculation. Sometimes programming languages are a bit more expressive than the usual mathematic notation. :-) ) – jcsahnwaldt Reinstate Monica Nov 24 '18 at 14:46
  • 2
    Thanks, you're totally right! Fixed, though I was lazy and strayed from mathematical notation... oh, and I messed that up too. Fixing again... – sfink Nov 27 '18 at 01:17
1

I have read all thirty answers and found the simplest one i.e to use a bit array of 100 to be the best. But as the question said we can't use an array of size N, I would use O(1) space complexity and k iterations i.e O(NK) time complexity to solve this.

To make the explanation simpler, consider I have been given numbers from 1 to 15 and two of them are missing i.e 9 and 14 but I don't know. Let the bag look like this:

[8,1,2,12,4,7,5,10,11,13,15,3,6].

We know that each number is represented internally in the form of bits. For numbers till 16 we only need 4 bits. For numbers till 10^9, we will need 32 bits. But let's focus on 4 bits and then later we can generalize it.

Now, assume if we had all the numbers from 1 to 15, then internally, we would have numbers like this (if we had them ordered):

0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

But now we have two numbers missing. So our representation will look something like this (shown ordered for understanding but can be in any order):

(2MSD|2LSD)
00|01
00|10
00|11
-----
01|00
01|01
01|10
01|11
-----
10|00
missing=(10|01) 
10|10
10|11
-----
11|00
11|01
missing=(11|10)
11|11

Now let's make a bit array of size 2 that holds the count of numbers with corresponding 2 most significant digits. i.e

= [__,__,__,__] 
   00,01,10,11

Scan the bag from left and right and fill the above array such that each of bin of bit array contains the count of numbers. The result will be as under:

= [ 3, 4, 3, 3] 
   00,01,10,11

If all the numbers would have been present, it would have looked like this:

= [ 3, 4, 4, 4] 
   00,01,10,11

Thus we know that there are two numbers missing: one whose most 2 significant digits are 10 and one whose most 2 significant bits are 11. Now scan the list again and fill out a bit array of size 2 for the lower 2 significant digits. This time, only consider elements whose most 2 significant digits are 10. We will have the bit array as:

= [ 1, 0, 1, 1] 
   00,01,10,11

If all numbers of MSD=10 were present, we would have 1 in all the bins but now we see that one is missing. Thus we have the number whose MSD=10 and LSD=01 is missing which is 1001 i.e 9.

Similarly, if we scan again but consider only elements whose MSD=11,we get MSD=11 and LSD=10 missing which is 1110 i.e 14.

= [ 1, 0, 1, 1] 
   00,01,10,11

Thus, we can find the missing numbers in a constant amount of space. We can generalize this for 100, 1000 or 10^9 or any set of numbers.

References: Problem 1.6 in http://users.ece.utexas.edu/~adnan/afi-samples-new.pdf

Manish
  • 1,999
  • 2
  • 15
  • 26
  • Interesting. This is very similar to how Radix sort works. I'd say that you're trading space for time a bit here. You have `O(1)` space, but `O(n*log(n))` time (as n grows, you'll have to consider more bits). – Elliott Sep 18 '20 at 01:09
  • I don't think space is constant. If I understand correctly, you want to re-use the 4-element array for each input scan. But where do you store the results of the previous input scan? – jcsahnwaldt Reinstate Monica Jun 17 '21 at 12:00
  • I think you could simplify the algorithm by using a 2-element array instead of a 4-element array. During input scan i, you only look at bit i (instead of two adjacent bits). – jcsahnwaldt Reinstate Monica Jun 17 '21 at 12:03
0

You can motivate the solution by thinking about it in terms of symmetries (groups, in math language). No matter the order of the set of numbers, the answer should be the same. If you're going to use k functions to help determine the missing elements, you should be thinking about what functions have that property: symmetric. The function s_1(x) = x_1 + x_2 + ... + x_n is an example of a symmetric function, but there are others of higher degree. In particular, consider the elementary symmetric functions. The elementary symmetric function of degree 2 is s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n, the sum of all products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher. They are obviously symmetric. Furthermore, it turns out they are the building blocks for all symmetric functions.

You can build the elementary symmetric functions as you go by noting that s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1)). Further thought should convince you that s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1)) and so on, so they can be computed in one pass.

How do we tell which items were missing from the array? Think about the polynomial (z-x_1)(z-x_2)...(z-x_n). It evaluates to 0 if you put in any of the numbers x_i. Expanding the polynomial, you get z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n. The elementary symmetric functions appear here too, which is really no surprise, since the polynomial should stay the same if we apply any permutation to the roots.

So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as others have mentioned.

Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric polynomial will be of the order 100!), we can do these calculations mod p where p is a prime bigger than 100. In that case we evaluate the polynomial mod p and find that it again evaluates to 0 when the input is a number in the set, and it evaluates to a non-zero value when the input is a number not in the set. However, as others have pointed out, to get the values out of the polynomial in time that depends on k, not N, we have to factor the polynomial mod p.

Edward Doolittle
  • 4,002
  • 2
  • 14
  • 27
0

Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even have support for it, like in Ruby:

missing = (1..100).to_a - bag

It's probably not the most efficient solution but it's one I would use in real life if I was faced with such a task in this case (known boundaries, low boundaries). If the set of number would be very large then I would consider a more efficient algorithm, of course, but until then the simple solution would be enough for me.

DarkDust
  • 90,870
  • 19
  • 190
  • 224
0

This might sound stupid, but, in the first problem presented to you, you would have to see all the remaining numbers in the bag to actually add them up to find the missing number using that equation.

So, since you get to see all the numbers, just look for the number that's missing. The same goes for when two numbers are missing. Pretty simple I think. No point in using an equation when you get to see the numbers remaining in the bag.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Stephan M
  • 33
  • 1
  • 2
    I think the benefit of summing them up is that you don't have to remember which numbers you've already seen (e.g., there's no extra memory requirement). Otherwise the only option is to retain a set of all the values seen and then iterate over that set again to find the one that's missing. – Dan Tao Sep 02 '10 at 23:00
  • 3
    This question is usually asked with the stipulation of O(1) space complexity. –  Sep 14 '10 at 21:38
  • The sum of the first N numbers is N(N+1)/2. For N=100, Sum=100*(101)/2=5050 ; – tmarthal Apr 29 '11 at 01:39
0

You could try using a Bloom Filter. Insert each number in the bag into the bloom, then iterate over the complete 1-k set until reporting each one not found. This may not find the answer in all scenarios, but might be a good enough solution.

jdizzle
  • 4,078
  • 1
  • 30
  • 38
  • There is also the counting bloom filter, which allows deletion. Then you can just add all the numbers and delete the ones you see in the stream. – Thomas Ahle Apr 25 '16 at 21:31
  • Haha this is probably one of the more practical answers, but gets little attention. – ldog May 26 '19 at 14:48
  • A Bloom Filter (BF) still takes linear space. As you say, it doesn't guarantee a solution. The better version of this is a boolean (bit) array, which takes the minimum amount of linear space, does it in O(n) time, and always gets the right answer. The BF is basically trying to make use of this technique when the key numbers are larger that the array size, which we don't have to worry about in our case, so there's no need for the compromise designed by the BF. – Elliott Sep 18 '20 at 02:46
0

I'd take a different approach to that question and probe the interviewer for more details about the larger problem he's trying to solve. Depending on the problem and the requirements surrounding it, the obvious set-based solution might be the right thing and the generate-a-list-and-pick-through-it-afterward approach might not.

For example, it might be that the interviewer is going to dispatch n messages and needs to know the k that didn't result in a reply and needs to know it in as little wall clock time as possible after the n-kth reply arrives. Let's also say that the message channel's nature is such that even running at full bore, there's enough time to do some processing between messages without having any impact on how long it takes to produce the end result after the last reply arrives. That time can be put to use inserting some identifying facet of each sent message into a set and deleting it as each corresponding reply arrives. Once the last reply has arrived, the only thing to be done is to remove its identifier from the set, which in typical implementations takes O(log k+1). After that, the set contains the list of k missing elements and there's no additional processing to be done.

This certainly isn't the fastest approach for batch processing pre-generated bags of numbers because the whole thing runs O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... + log k)). But it does work for any value of k (even if it's not known ahead of time) and in the example above it was applied in a way that minimizes the most critical interval.

Blrfl
  • 6,817
  • 1
  • 25
  • 25
-1

I think this can be generalized like this:

Denote S, M as the initial values for the sum of arithmetic series and multiplication.

S = 1 + 2 + 3 + 4 + ... n=(n+1)*n/2
M = 1 * 2 * 3 * 4 * .... * n 

I should think about a formula to calculate this, but that is not the point. Anyway, if one number is missing, you already provided the solution. However, if two numbers are missing then, let's denote the new sum and total multiple by S1 and M1, which will be as follows:

S1 = S - (a + b)....................(1)

Where a and b are the missing numbers.

M1 = M - (a * b)....................(2)

Since you know S1, M1, M and S, the above equation is solvable to find a and b, the missing numbers.

Now for the three numbers missing:

S2 = S - ( a + b + c)....................(1)

Where a and b are the missing numbers.

M2 = M - (a * b * c)....................(2)

Now your unknown is 3 while you just have two equations you can solve from.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jack_of_All_Trades
  • 10,942
  • 18
  • 58
  • 88
  • 1
    The multiplication gets quite large though.. Also, how do you generalize to more than 2 missing numbers? – Thomas Ahle Apr 26 '16 at 13:39
  • 1
    I have tried these formulae on very simple sequence with N = 3 and missing numbers = {1, 2}. I didn't work, as I believe the error is in formulae (2) which should read `M1 = M / (a * b)` (see [that answer](http://stackoverflow.com/a/26959097/267197)). Then it works fine. – dma_k Sep 02 '16 at 08:37
-1

I don't know whether this is efficient or not but I would like to suggest this solution.

  1. Compute xor of the 100 elements
  2. Compute xor of the 98 elements (after the 2 elements are removed)
  3. Now (result of 1) XOR (result of 2) gives you the xor of the two missing nos i..e a XOR b if a and b are the missing elements
    4.Get the sum of the missing Nos with your usual approach of the sum formula diff and lets say the diff is d.

Now run a loop to get the possible pairs (p,q) both of which lies in [1 , 100] and sum to d.

When a pair is obtained check whether (result of 3) XOR p = q and if yes we are done.

Please correct me if I am wrong and also comment on time complexity if this is correct

user2221214
  • 133
  • 1
  • 13
  • 3
    I don't think the sum and xor uniquely define two numbers. Running a loop to get all possible k-tuples that sum to d takes time O(C(n,k-1))=O(nk-1), which, for k>2, is bad. – Teepeemm Sep 26 '14 at 13:57
-1

Try to find the product of numbers from 1 to 50:

Let product, P1 = 1 x 2 x 3 x ............. 50

When you take out numbers one by one, multiply them so that you get the product P2. But two numbers are missing here, hence P2 < P1.

The product of the two mising terms, a x b = P1 - P2.

You already know the sum, a + b = S1.

From the above two equations, solve for a and b through a quadratic equation. a and b are your missing numbers.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Manjunath
  • 15
  • 1
  • Provably there are no quadratic equations for numbers 3 or greater. Just 2. – Tatarize Dec 17 '15 at 14:12
  • I tried to apply the given formulae but I failed. Let's take N=3 (sequence `{1,2,3}`) with two missing numbers `{a,b} = {1,2}`. That results `a×b = 6-3, a+b = 6` ⇒ `b=6-a, a²-6a+3 = 0` ⇒ wrong. – dma_k Sep 02 '16 at 08:44
  • 1
    `axb` is not `P1-P2`, it's `P1/P2` – Wander3r Feb 06 '19 at 06:40
-1

I believe I have a O(k) time and O(log(k)) space algorithm, given that you have the floor(x) and log2(x) functions for arbitrarily big integers available:

You have an k-bit long integer (hence the log8(k) space) where you add the x^2, where x is the next number you find in the bag: s=1^2+2^2+... This takes O(N) time (which is not a problem for the interviewer). At the end you get j=floor(log2(s)) which is the biggest number you're looking for. Then s=s-j and you do again the above:

for (i = 0 ; i < k ; i++)
{
  j = floor(log2(s));
  missing[i] = j;
  s -= j;
}

Now, you usually don't have floor and log2 functions for 2756-bit integers but instead for doubles. So? Simply, for each 2 bytes (or 1, or 3, or 4) you can use these functions to get the desired numbers, but this adds an O(N) factor to time complexity

Nakilon
  • 34,866
  • 14
  • 107
  • 142
-1

Yet another way is using residual graph filtering.

Suppose we have numbers 1 to 4 and 3 is missing. The binary representation is the following,

1 = 001b, 2 = 010b, 3 = 011b, 4 = 100b

And I can create a flow-graph like the following.

                   1
             1 -------------> 1
             |                | 
      2      |     1          |
0 ---------> 1 ----------> 0  |
|                          |  |
|     1            1       |  |
0 ---------> 0 ----------> 0  |
             |                |
      1      |      1         |
1 ---------> 0 -------------> 1

Note that the flow graph contains x nodes, while x being the number of bits. And the maximum number of edges are (2*x)-2 .

So for 32 bit integer it will take O(32) space or O(1) space.

Now if I remove capacity for each number starting from 1,2,4 then I am left with a residual graph.

0 ----------> 1 ---------> 1

Finally I shall run a loop like the following,

 result = []
 for x in range(1,n):
     exists_path_in_residual_graph(x)
     result.append(x)

Now the result is in result contains numbers that are not missing as well(false positive). But the k <= (size of the result) <= n when there are k missing elements.

I shall go through the given list one last time to mark the result missing or not.

So the time complexity will be O(n) .

Finally, it is possible to reduce the number of false positive(and the space required) by taking nodes 00,01,11,10 instead of just 0 and 1.

KRoy
  • 1,290
  • 14
  • 10
  • 4
    I don't understand your graph diagram. What do the nodes, edges, and numbers represent? Why are some of the edges directed and not others? – dain Sep 19 '19 at 15:58
  • 4
    In fact I don't really understand your answer at all, can you clarify some more? – dain Sep 19 '19 at 16:13
-1

Lets say an ArrayList object(myList) is filled with those numbers and in that, 2 numbers x and y are missing.So the possible solution can be:

int k = 1;
        while (k < 100) {
            if (!myList.contains(k)) {
                System.out.println("Missing No:" + k);
            }
            k++;
        }
SagarS
  • 21
  • 1
  • 6
  • The `contains` method runs in O(n) time, so you're solution is a O(n^2) solution, which is slower than just sorting the array first then iterating over to find what's missing [O(n*log(n)) time, O(1) space, (with heap sort)]. – Elliott Sep 18 '20 at 00:55
-1

You can also create an array of boolean of the size last_element_in_the_existing_array + 1.

In a for loop mark all the element true that are present in the existing array.

In another for loop print the index of the elements which contains false AKA The missing ones.

Time Complexity: O(last_element_in_the_existing_array)

Space Complexity: O(array.length)

Debu Shinobi
  • 2,057
  • 18
  • 21
  • The OP said `I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the presence/absence each number by the value of a designated bit, therefore using O(N) bits in additional space.` This answer requires far too much memory. – Elliott Sep 18 '20 at 01:00
-2

A possible solution:

public class MissingNumber {
    public static void main(String[] args) {
        // 0-20
        int [] a = {1,4,3,6,7,9,8,11,10,12,15,18,14};
        printMissingNumbers(a,20);
    }

    public static void printMissingNumbers(int [] a, int upperLimit){
        int b [] = new int[upperLimit];
        for(int i = 0; i < a.length; i++){
            b[a[i]] = 1;
        }
        for(int k = 0; k < upperLimit; k++){
            if(b[k] == 0)
                System.out.println(k);
        }
    }
}
Rubens
  • 14,478
  • 11
  • 63
  • 92
  • It should be `System.out.println(k + 1)` since `0` is not counted to as missing number. Also, this doesn't work in an unsorted list of numbers. – mr5 Dec 12 '13 at 02:29
-2

you could use binary search to find intervals of missing (or contiguous) numbers. The run time should be around (num intervals) * log(avg interval length) * N. Useful if there aren't many intervals.

Rusty Rob
  • 16,489
  • 8
  • 100
  • 116
  • A binary search assumes that the list is sorted. If the list is sorted then the problem is trivially solvable in O(n) time and O(1) space by just iterating over the list and print when you notice that numbers have been skipped. – Elliott Sep 18 '20 at 01:21
  • @Elliott sorry i should have been more clear, we are binary searching for missing intervals. E.g. start with (0, 100), in O(N) we see this interval contains less than 100 items, so we change the interval to (0, 50), then (0,25), then we maybe see 25 items from (0,25), so we try (25, 50), and so on, So we use 0 space and no sorting required – Rusty Rob Sep 19 '20 at 04:36
  • I'm sorry, could you explain this more? At your iteration of size 100, you said that you could "see" in linear time that there are less than 100 numbers (presumably unique numbers). But how? Because this appears to be a kind of divide and conquer method, we no longer have useful bounds on the element values. And what happens if all elements are unique except at indices 5 and 35: when you look at [0,24] you see all unique values, then look at [25,49] and see all unique values. This doesn't seem to help us... – Elliott Sep 19 '20 at 07:33
  • 1+2+..+n = n*(n+1)/2, so if we maintain a count and only add numbers to the count if they're within the interval, then at the end we can see if the interval is the size we would expect e.g. for (a, b) we expect count to be b*(b+1)/2 - (a-1)*a/2. In the problem statement it mentions 'Each number appears exactly once'. People have already mentioned how to solve if theres one or zero missing from an interval. This is away to extend to K that's fairly easy to code, possibly reasonably efficient, and constant space – Rusty Rob Sep 19 '20 at 07:56
  • Okay, thanks for your explanation. It has best-case time complexity O(kn) and worst-case O(n^2). I downvoted the answer earlier, but I'll remove if/when you edit the answer to explain what you've just said. – Elliott Sep 19 '20 at 11:55
-2

one way do it would be to calculate modulo the prime number 101.

calculate and store the product of the integers 1 up to 100, reduce this number modulo 101. Little exo: the result will be 1.

calculate and store the sum of all the numbers 1 up to 100, reduce the result modulo 101. Little exo: The result will be 0.

now suppose the bag has the numbers x and y removed.

Calculate the product and sum of everything in the bag modulo 101. So i will know the values of

a = x+y and b= x*y

modulo 101.

now it is easy to find x and y modulo 101 (solvea quadratic poly over the finite field with 101 elements).

Now you know x and y modulo 101. but since you also know that x and y are smaller than 101, you know their true values.

mnr
  • 604
  • 2
  • 7
  • 15
-2

We can do the Q1 and Q2 in O(log n) most of the time.

Suppose our memory chip consists of array of n number of test tubes. And a number x in the the test tube is represented by x milliliter of chemical-liquid.

Suppose our processor is a laser light. When we light up the laser it traverses all the tubes perpendicularly to it's length. Every-time it passes through the chemical liquid, the luminosity is reduced by 1. And passing the light at certain milliliter mark is an operation of O(1).

Now if we light our laser at the middle of the test-tube and get the output of luminosity

  • equals to a pre-calculated value(calculated when no numbers were missing), then the missing numbers are greater than n/2 .
  • If our output is smaller, then there is at least one missing number that is smaller than n/2 . We can also check if the luminosity is reduced by 1 or 2. if it is reduced by 1 then one missing number is smaller than n/2 and other is bigger than n/2 . If it is reduced by 2 then both numbers are smaller than n/2 .

We can repeat the above process again and again narrowing down our problem domain. In each step, we make the domain smaller by half. And finally we can get to our result.

Parallel algorithms that are worth mentioning(because they are interesting),

  • sorting by some parallel algorithm, for example, parallel merge can be done in O(log^3 n) time. And then the missing number can be found by binary search in O(log n) time.
  • Theoretically, if we have n processors then each process can check one of the inputs and set some flag that identifies the number(conveniently in an array). And in the next step each process can check each flag and finally output the number that is not flagged. The whole process will take O(1) time. It has additional O(n) space/memory requirement.

Note, that the two parallel algorithms provided above may need additional space as mentioned in the comment.

KRoy
  • 1,290
  • 14
  • 10
  • While the test-tube-laser method is genuinely interesting, I hope you agree that it doesn't translate well to hardware instructions and so quite unlikely to be `O(logn)` on a computer. – SirGuy Dec 15 '17 at 15:18
  • 2
    As for your sorting method, that will take an amount of extra space that depends on `N`, and more than `O(N)` time (in terms of its dependency on `N`), which we are intending to do better than. – SirGuy Dec 15 '17 at 15:19
  • @SirGuy I appreciate your concern about test-tube concept and parallel processing memory cost. My post is to share my thoughts about the problem. GPU processors are now doing parallel processing possible. Who knows, if the test-tube concept wont be available in future. – KRoy Dec 15 '17 at 21:56
-3

If a number only appears exactly one time, it is pretty easy to tell in the following way:

Create a boolean array, boolArray, of size of the given number; here it is 100.

Loop through the input numbers and set an element to true according the number value. For example, if 45 found, then set boolArray[45-1] = true;

That will be an O(N) operation.

Then loop through boolArray. If an element is staying false, then the index of element + 1 is the missing number. For example, if boolArray[44] is false, we know number 45 is missing.

That is an O(n) operation. Space complexity is O(1).

So this solution can work to find any missing number from a given continuous number set.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Boveyking
  • 49
  • 3
  • 3
    No, space complexity of this approach is O(n). Futhermore, this way is already mentioned in the question. – sdcvvc Nov 23 '12 at 19:13
-3

We can use the following simple code to find repeating and missing values:

    int size = 8;
    int arr[] = {1, 2, 3, 5, 1, 3};
    int result[] = new int[size];

    for(int i =0; i < arr.length; i++)
    {
        if(result[arr[i]-1] == 1)
        {
            System.out.println("repeating: " + (arr[i]));
        }
        result[arr[i]-1]++;
    }

    for(int i =0; i < result.length; i++)
    {
        if(result[i] == 0)
        {
            System.out.println("missing: " + (i+1));
        }
    }
-3
// Size of numbers
def n=100;

// A list of numbers that is missing k numbers.
def list;

// A map
def map = [:];

// Populate the map so that it contains all numbers.
for(int index=0; index<n; index++)
{
  map[index+1] = index+1;  
}

// Get size of list that is missing k numbers.
def size = list.size();

// Remove all numbers, that exists in list, from the map.
for(int index=0; index<size; index++)
{
  map.remove(list.get(index));  
}

// Content of map is missing numbers
println("Missing numbers: " + map);
  • This requires O(n) space, and worst-case scenario it's O(n^2) time (generic maps can take O(n) time to add an element - good maps just make it less likely to happen). – Elliott Sep 18 '20 at 01:28
-3

Let us assume it is an array from 1 to N and its elements are a1, a2, ...., aN:

1+N=N+1;
2+N-1=N+1;

..... So the sum here is unique. We can scan the array from the start and from the end to add both the elements. If the sum is N+1; then okay, otherwise they are missing.

for (I <= N/2) {
    temp = a[I] + a[n-I];
    if (temp != N+1) then
        Find the missing number or numbers
}

Iterate this loop, and you get the answer easily.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Surendra Bobba
  • 476
  • 3
  • 17
-3

The key is to use indexes to mark if a number is present or not in the range. Here we know that we have 1 to N. Time complexity O(n) Space complexity O(1)

Followup questions: This may be modified to find if an element is missing from an AP of difference d. Other variation may include find first missing +ve number from any random array containing -ve number as well. Then first partition around 0 quick sort, then do this procedure on right side of partition part of the array, do necessary modification.

public static void  missing(int [] arr){        
      for(int i=0; i< arr.length; i++){       
          if(arr[i]!=-1 && arr[i]<=arr.length){
              int idx=i;
              while(idx>=0 && idx<arr.length&& arr[idx]!=-1 ){
                   int temp =arr[idx];
                   // temp-1 because array index starts from 0, i.e a[0]=-1 is indicates that 1 is present in the array
                   arr[temp-1]=-1;
                   idx=temp-1;
              }
          }
      }
    }

After this we need to iterate over array, and check if a[i]!=-1, then i+1 is the missing number. We have to be careful when a[i]>N.

Rahul
  • 132
  • 5
  • "do a quick sort"? That doesn't fit inside the O(n) time and O(1) space complexities. – SirGuy Feb 26 '15 at 20:28
  • @GuyGreer, I should have been more precise with words. When I said quick sort , I meant partition around "0". I think you didn't understand at all. You saw quick sort and and jumped to down-vote!. – Rahul Feb 26 '15 at 22:33
  • What do you mean by "partition around 0"? I would interpret that to mean "find which numbers are greater than 0, and which less". But we know that the numbers come from 1 to N, so my interpretation doesn't gain us any information. – Teepeemm Nov 08 '15 at 20:42
-4

Disclaimer: I have read this question for days, but it is beyond my knowledge to understand the math.

I tried to solve it using set:

arr=[1,2,4,5,7,8,10] # missing 3,6,9
NMissing=3
arr_origin = list(range(1,arr[-1]+1))

for i in range(NMissing):
      arr.append(arr[-1]) ##### assuming you do not delete the last one

arr=set(arr)
arr_origin=set(arr_origin)
missing=arr_origin-arr # 3 6 9
janicebaratheon
  • 976
  • 1
  • 10
  • 21
  • This uses `O(N)` extra space and not `O(1)`. This code also raises an exception because you can't append to an `int` (which you are doing in your loop). The code will also fail if the last number is one of the ones removed, but depending on how exactly you determine `N` that may not be an issue. Why do you ask people not to downvote your answer? If you think people will just downvote it why did you post this at all? Asking not to downvote is not (and shouldn't) stop people from downvoting incorrect answers. – SirGuy Jan 19 '17 at 15:45
  • @GuyGreer just changed to " arr.append". Thank you for your comment. – janicebaratheon Jan 19 '17 at 17:23
  • 1
    This code can be summed up by `missing = set(range(1, len(arr)+NMissing)) - set(arr)`. The loop is unnecessary and you can make the `set` from the `range` directly. This doesn't change the fact that the whole point of this question is to solve this without allocating an array the length of `len(arr)` and while only reading through the data once. This solution achieves neither of these. – SirGuy Jan 19 '17 at 17:36
-5

For different values of k, the approach will be different so there won't be a generic answer in terms of k. For example, for k=1 one can take advantage of the sum of natural numbers, but for k = n/2, one has to use some kind of bitset. The same way for k=n-1, one can simply compare the only number in the bag with the rest.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
bhups
  • 14,345
  • 8
  • 49
  • 57
  • 3
    As you can see from the many other answers, there *are* generic algorithms that work for any k. The bitset approach does not run in extra space O(k). – Sneftel Aug 16 '18 at 05:48
-6

This is a very easy question

void findMissing(){
    bool record[N] = {0};
    for(int i = 0; i < N; i++){
        record[bag[i]-1] = 1;
    }
    for(int i = 0; i < N; i++){
        if(!record[i]) cout << i+1 << endl;
    }
}

O(n) time and space complexity

  • 3
    We're specifically not looking for the O(n) space solution of writing everything down. – Teepeemm Sep 26 '14 at 13:57
  • 1
    Further Optimizations 1) Use bits instead an array of bools 2) Use a link list full of numbers 1-N and remove the ones you find Also, your smart equations still equate to my solution when you boil it down. – user3692106 Dec 16 '14 at 16:36
  • The sum(x), sum(x^2), etc. method is nothing at all like using a bitset, except that you get the same answer. I guess mergesort equates to quicksort, too? – Peter Cordes Sep 09 '15 at 21:02
-6
    //sort
    int missingNum[2];//missing 2 numbers- can be applied to more than 2
    int j = 0;    
    for(int i = 0; i < length - 1; i++){
        if(arr[i+1] - arr[i] > 1 ) {
            missingNum[j] = arr[i] + 1;
            j++;
        }
    }
rldl
  • 73
  • 1
  • 9
-6

I have written the code using Java 8 and before Java 8. It uses a formula : (N*(N+1))/2 for sum of all the numbers.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

   /**
 * 
 * 
 * @author pradeep
 * 
 *         Answer : SumOfAllNumbers-SumOfPresentNumbers=Missing Number;
 * 
 *         To GET SumOfAllNumbers : Get the highest number (N) by checking the
 *         length. and use the formula (N*(N+1))/2
 * 
 *         To GET SumOfPresentNumbers: iterate and add it
 * 
 * 
 */
public class FindMissingNumber {
    /**
     * Before Java 8
     * 
     * @param numbers
     * @return
     */
    public static int missingNumber(List<Integer> numbers) {
        int sumOfPresentNumbers = 0;
        for (Integer integer : numbers) {
            sumOfPresentNumbers = sumOfPresentNumbers + integer;
        }
        int n = numbers.size();
        int sumOfAllNumbers = (n * (n + 1)) / 2;
        return sumOfAllNumbers - sumOfPresentNumbers;
    }
    /**
     * Using Java 8 . mapToInt & sum using streams.
     * 
     * @param numbers
     * @return
     */
    public static int missingNumberJava8(List<Integer> numbers) {
        int sumOfPresentNumbers = numbers.stream().mapToInt(i -> i).sum();
        int n = numbers.size();
        int sumOfAllNumbers = (n * (n + 1)) / 2;
        return sumOfAllNumbers - sumOfPresentNumbers;
    }
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<>();
        list = Arrays.asList(0, 1, 2, 4);
        System.out.println("Missing number is :  " + missingNumber(list));
        System.out.println("Missing number using Java 8 is : " + missingNumberJava8(list));
    }
}*