Find the element occuring once in an array where all other elements occur twice (without using XOR)

Question

I have tried solving this for so long but I can't seem to be able to.
The question is as follows:

Given an array n numbers where all of the numbers in it occur twice except for one, which occurs only once, find the number that occurs only once.

Now, I have found many solutions online for this, but none of them satisfy the additional constraints of the question.
The solution should:

Run in linear time (aka O(n)).
Not use hash tables.
Assume that computer supports only comparison and the arithmetic (addition, subtraction, multiplication, division).
The number of bits in each number in the array is about O(log(n)).

Therefore, trying something like this https://stackoverflow.com/a/4772568/7774315 using the XOR operator isn't possible, since we don't have the XOR operator. Since the number of bits in each number is about O(log(n)), trying to implement the XOR operator using normal arithmetic (bit by bit) will take about O(log(n)) actions, which will give us an overall solution of O(nlog(n)).

The closest I have come to solving it is if I had a way to get the sum of all unique values in the array in linear time, I could subtract twice that sum from the overall sum to get (negative) the element that occurs only once, because if the numbers that appear twice are {a1,a2,....,ak} and the number that appears once is x, then the overall sum is
sum=2(a1+...+ak)+x
As far as I know, sets are implemented using hash tables, so using them to find the sum of all unique values is no good.

Addition, subtraction, multiplication and division all count as O(1). — Kobi Mizrachi, Jun 19 '20 at 11:00
The required time complexity is O(n) and we have an array of numbers, so in the worst case we can copy it without worsening the complexity. So, doing whatever you want to the array is allowed. — Kobi Mizrachi, Jun 19 '20 at 14:38
If there is about O(log n) bits in each number, it seems that the maximum value is O(n). Then a kind of counting sort can be performed in O(n) operations and O(n) memory. However, the indication `is about O(log(n))` is vague a little bit. — Damien, Jun 19 '20 at 16:13

score 5 · Answer 1 · answered Jun 19 '20 at 12:24

5

Let's imagine we had a way to find the exact median in linear time and partition the array so all greater elements are on one side and smaller elements on the other. By the parity of expected number of elements, we could identify which side the target element is in. Now perform this routine recursively in the section we identified. Since the section is halved in size each time, the total number of elements traversed cannot exceed O(2n) = O(n).

answered Jun 19 '20 at 12:24

גלעד ברקן

23,602
3
25
61

The number of recursive calls is `log n` though, so we have `O(n log n)`, or am I missing something? – norok2 Jun 19 '20 at 12:54
if you can reorder the list, each recursive call acts on half so many values. – One Lyner Jun 19 '20 at 12:57
@norok yes, you're missing that in each one of those calls, the size of the section we are examining in linear time is halved. 1 + 1/2 + 1/4... converges to 2. – גלעד ברקן Jun 19 '20 at 13:00
I am not aware of a median O(n) algorithm, but is it really needed? Wouldn't pivoting be the mean be enough? – norok2 Jun 19 '20 at 13:45
@norok2 using the mean seems like ot might not guarantee that the section size is halved each time, but I'm not sure. Can you prove or show that it would, amortised? – גלעד ברקן Jun 19 '20 at 13:50
1

The median in linear time is possible via QuickSelect, using the "Median of Median" for pivot selection. For an efficient version of those ideas, see https://erdani.com/research/sea2017.pdf and the corresponding github https://github.com/andralex/MedianOfNinthers . – One Lyner Jun 19 '20 at 13:55
I think this gets "average" halving, but I do not see a simple way of proving it, neither I am finding a distribution counter-example without employing arbitrarily large numbers. – norok2 Jun 19 '20 at 14:35
@norok2: Try [0, 1, 1, 2, 2, 4, 4, ..., 2^k, 2^k] , partitioning around the mean will only remove 2 elements at a time. But agreed, elements have O(n) bits instead of O(log n) – One Lyner Jun 19 '20 at 14:44

Damien · Answer 2 · 2020-06-19T17:27:47.853

The key element in the question seems to be this one:

The number of bits in each number in the array is about O(log(n)).

The issue is that this clue is vague a little bit.

A first approach is to consider that the maximum value is O(n). Then a counting sort can be performed in O(n) operations and O(n) memory.

It will consists in finding the maximum value MAX, setting an integer array C[MAX] and performing directly a classical counting sort thanks to it

C[a[i]]++;

Looking for an odd value in array C[] will provide the solution.

A second approach, I guess more efficient, would be to set an array of size n, each element consisting of an array of unknown size. Then, a kind of almost counting sort would consists in :

C[a[i]%n].append (a[i]);

To find the unique element, we then have to find a sub-array of odd size, and then to examine the elements in this sub-array.

The maximum size k of each sub-array will be about 2*(MAX/n). According to the clue, this value should be very low. Dealing with this sub-array has a complexity O(k), for example by performing a counting sort on the b[j]/n, all the elements being equal modulo n.

We can note that practically, this is equivalent to perform a kind of ad-hoc hashing.

Global complexity is O(n + MAX/n).

norok2 · Answer 3 · 2020-06-19T17:06:03.097

This should do the trick as long as your a dealing with integers of size O(log n). It is a Python implementation of the algorithm sketched @גלעד ברקן answer (including @OneLyner comments), where the median is replaced by a mean or mid-value.

def mean(items):
    result = 0
    for i, item in enumerate(items, 1):
        result = (result * (i - 1) + item) / i
    return result


def midval(items):
    min_val = max_val = items[0]
    for item in items:
        if item < min_val:
            min_val = item
        elif item > max_val:
            max_val = item
    return (max_val - min_val) / 2


def find_singleton(items, pivoting=mean):
    n = len(items)
    if n == 1:
        return items[0]
    else:
        # find pivot - O(n)
        pivot = pivoting(items)
        # partition the items - O(n)
        j = 0
        for i, item in enumerate(items):
            if item > pivot:
                items[j], items[i] = items[i], items[j]
                j += 1
        # recursion on the partition with odd number of elements
        if j % 2:
            return find_singleton(items[:j])
        else:
            return find_singleton(items[j:])

The following code is just for some sanity-checking on random inputs:

def gen_input(n, randomize=True):
    """Generate inputs with unique pairs except one, with size (2 * n + 1)."""
    items = sorted(set(random.randint(-n, n) for _ in range(n)))[:n]
    singleton = items[-1]
    items = items + items[:-1]
    if randomize:
        random.shuffle(items)
    return items, singleton


items, singleton = gen_input(100)
print(singleton, len(items), items.index(singleton), items)
print(find_singleton(items, mean))
print(find_singleton(items, midval))

For a symmetric distribution the median and the mean or mid-value coincide. With the log(n) requirement on the number of bits for the entries, one can show that any arbitrary sub-sampling cannot be skewed enough to provide more than log(n) recursions.

For example, considering the case of k = log(n) bits with k = 4 and only positive numbers, the worst case is: [0, 1, 1, 2, 2, 4, 4, 8, 8, 16, 16]. Here pivoting by the mean will reduce the input by 2 at time, resulting in k + 1 recursive calls, but adding any other couple to the input will not increase the number of recursive calls, while it will increase the input size.

(EDITED to provide a better explanation.)

One Lyner · Answer 4 · 2020-06-19T22:27:09.927

Here is an (unoptimized) implementation of the idea sketched by גלעד ברקן . I'm using Median_of_medians to get a value close enough to the median to ensure the linear time in the worst case.

NB: this in fact uses only comparisons, and is O(n) whatever the size of the integers as long as comparisons and copies are counted as O(1).

def median_small(L):
    return sorted(L)[len(L)//2]

def median_of_medians(L):
    if len(L) < 20:
        return median_small(L)
    return median_of_medians([median_small(L[i:i+5]) for i in range(0, len(L), 5)])

def find_single(L): 
    if len(L) == 1: 
        return L[0] 
    pivot = median_of_medians(L) 
    smaller = [i for i in L if i <= pivot] 
    bigger =  [i for i in L if i > pivot] 
    if len(smaller) % 2: 
        return find_single(smaller) 
    else: 
        return find_single(bigger)

This version needs O(n) additional space, but could be implemented with O(1).

@norok2 No, as it's applied only to lists with less than 20 elements, it's O(1). — One Lyner, Jun 19 '20 at 16:57

Find the element occuring once in an array where all other elements occur twice (without using XOR)

4 Answers4