Alright here we go...apologies to anyone expecting a faster solution. It turns out my teacher was having a little fun with me and I completely missed the point of what he was saying.
I should begin by clarifying what I meant by:
he hinted that there was an even faster way of doing it
The gist of our conversation was this: he said that my XOR approach was interesting, and we talked for a while about how I arrived at my solution. He asked me whether I thought my solution was optimal. I said I did (for the reasons I mentioned in my question). Then he asked me, "Are you sure?" with a look on his face I can only describe as "smug". I was hesitant but said yeah. He asked me if I could think of a better way to do it. I was pretty much like, "You mean there's a faster way?" but instead of giving me a straight answer he told me to think about it. I said I would.
So I thought about it, sure that my teacher knew something I didn't. And after not coming up with anything for a day, I came here.
What my teacher actually wanted me to do was defend my solution as being optimal, not try to find a better solution. As he put it: creating a nice algorithm is the easy part, the hard part is proving it works (and that it's the best). He thought it was quite funny that I spent so much time in Find-A-Better-Way Land instead of working out a simple proof of O(n) that would have taken considerably less time (we ended up doing so, see below if you're interested).
So I guess, big lesson learned here. I'll be accepting Shashank Gupta's answer because I think that it does manage to answer the original question, even though the question was flawed.
I'll leave you guys with a neat little Python one-liner I found while typing the proof. It's not any more efficient but I like it:
def getUniqueElement(a, b):
return reduce(lambda x, y: x^y, a + b)
A Very Informal "Proof"
Let's start with the original two arrays from the question, a
and b
:
int[] a = {6, 5, 6, 3, 4, 2};
int[] b = {5, 7, 6, 6, 2, 3, 4};
We'll say here that the shorter array has length n
, then the longer array must have length n + 1
. The first step to proving linear complexity is to append the arrays together into a third array (we'll call it c
):
int[] c = {6, 5, 6, 3, 4, 2, 5, 7, 6, 6, 2, 3, 4};
which has length 2n + 1
. Why do this? Well, now we have another problem entirely: finding the element that occurs an odd number of times in c
(from here on "odd number of times" and "unique" are taken to mean the same thing). This is actually a pretty popular interview question and is apparently where my teacher got the idea for his problem, so now my question has some practical significance. Hooray!
Let's assume there is an algorithm faster than O(n), such as O(log n). What this means is that it will only access some of the elements of c
. For example, an O(log n) algorithm might only have to check log(13) ~ 4 of the elements in our example array to determine the unique element. Our question is, is this possible?
First let's see if we can get away with removing any of the elements (by "removing" I mean not having to access it). How about if we remove 2 elements, so that our algorithm only checks a subarray of c
with length 2n - 1
? This is still linear complexity, but if we can do that then maybe we can improve upon it even further.
So, let's choose two elements of c
completely at random to remove. There are actually several things that could happen here, which I'll summarize into cases:
// Case 1: Remove two identical elements
{6, 5, 6, 3, 4, 2, 5, 7, 2, 3, 4};
// Case 2: Remove the unique element and one other element
{6, 6, 3, 4, 2, 5, 6, 6, 2, 3, 4};
// Case 3: Remove two different elements, neither of which are unique
{6, 5, 6, 4, 2, 5, 7, 6, 6, 3, 4};
What does our array now look like? In the first case, 7 is still the unique element. In the second case there is a new unique element, 5. And in the third case there are now 3 unique elements...yeah it's a total mess there.
Now our question becomes: can we determine the unique element of c
just by looking at this subarray? In the first case we see that 7 is the unique element of the subarray, but we can't be sure it is also the unique element of c
; the two removed elements could have just as well been 7 and 1. A similar argument applies for the second case. In case 3, with 3 unique elements we have no way of telling which two are non-unique in c
.
It becomes clear that even with 2n - 1
accesses, there is just not enough information to solve the problem. And so the optimal solution is a linear one.
Of course, a real proof would use induction and not use proof-by-example, but I'll leave that to someone else :)