Find the first element that occurs only once

Question

This was a Google interview puzzle.

The problem is to find the first element in an array that occurs only once.

For example, abaaacdgadgf is given. We need to output b.

The simple solution seems to be to first count every element using a hashtable, and then loop through again to get the first element. It will use 2 loops.

Is it possible to get the result only use 1 loop?

I tried to figure it out, but it seems impossible.

@JanDvorak: the accepted answer to the linked question is very poor so that question is in essence unanswered. — n. m. could be an AI, Jul 09 '13 at 09:38
@JanDvorak: the code in the accepted answer is just a naïve quadratic algorithm, and a poorly implemented one at that. It would not get you very far in a Google interview. — n. m. could be an AI, Jul 09 '13 at 09:56
@n.m. ah, right. However, if you replace an ordinary list with a linked hash... — John Dvorak, Jul 09 '13 at 09:59
@n.m. Isn't the correct action then to put a bounty on that question instead of asking another? But since OP doesn't have that much reputation, I'm not really sure. — Bernhard Barker, Jul 09 '13 at 11:14
Aah... Found a solution. Posted it here http://stackoverflow.com/a/17549192/1903116. **The key idea here, is to process the data backwards.** — thefourtheye, Jul 09 '13 at 13:18

Matt · Answer 1 · 2013-07-09T09:54:46.247

The hash table points to items in a linked list. When adding an item the hashtable entry is created and the pointer added to the tail of the list. When a duplicate is found then the item can be removed from the list.

The first element to occur only once will be the first item in the list.

This code is a little untidy because most of the code is linked list implementation.

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

typedef struct stLISTITEM
{
    char data;
    struct stLISTITEM* previous;
    struct stLISTITEM* next;
} LISTITEM;

char firstCharThatOccursOnce(const char* s) {
    char ret;
    LISTITEM* head;
    LISTITEM* tail;
    LISTITEM* table[CHAR_MAX + 1] = {NULL}; /* Just pretend this is a hash table please */
    LISTITEM* cur;
    int i;

    head = malloc(sizeof(*head));
    tail = malloc(sizeof(*tail));

    head->next = tail;
    tail->previous = head;
    tail->data = '\0'; /* If all characters are repeated then return NULL character */

    for (; *s; s++) {
        cur = table[*s];

        if (cur == NULL) {
            /* Item hasn't been seen before */

            cur = malloc(sizeof(*cur));
            cur->data = *s;

            /* Add it to the end of the list */
            tail->previous->next = cur;
            cur->previous = tail->previous;
            tail->previous = cur;
            cur->next = tail;

            /* Add it to the table */
            table[*s] = cur;
        }
        else if (cur->next == NULL) {
            /* Seen it before, but already removed */
        }
        else {
            /* Seen it before, remove from list */
            cur->previous->next = cur->next;
            cur->next->previous = cur->previous;

            cur->next = NULL;
            cur->previous = NULL;
        }
    }

    ret = head->next->data;

    for (i = 0; i <= CHAR_MAX; i++) {
        free(table[i]);
    }

    free(head);
    free(tail);

    return ret;
}

int main(int argc, char const *argv[])
{
    char result = firstCharThatOccursOnce("abaaacdgadgf");

    printf("'%c' (%i)\n", result, result);

    return 0;
}

@Matt what is the time complexity and space complexity of your approach? — Aravind, Jul 09 '13 at 12:16
@Matt Figured it out: Time complexity: O(N*D) where D is the number of distinct characters in input string.And Space Complexity: O(D) Am I right? — Aravind, Jul 09 '13 at 12:25
time complexity is O(N),i guess.Why it need to use double linklist?signle linklist can't fit? — liumilan, Jul 09 '13 at 14:35
@liumilan: How? Are you doing constant amount of work for each element? — Aravind, Jul 09 '13 at 14:48
@Aravind, Why would you multiply it though? You don't iterate through N items every time you find a distinct one. — Matt, Jul 09 '13 at 15:59
@liumilan, It's doubly linked because I read from the back and from the front. Doubly linked lists allow you to delete an item in the middle without iterating over the list, singly linked won't let you do that. — Matt, Jul 09 '13 at 16:00

score 2 · Answer 2 · answered Jul 09 '13 at 09:03

2

Here my solution:

Each 'char' has 4 stats possible:

1 : never seen.
2 : seen one
3 : eliminated because of multi occurence.
4 : qualified

I create an array of size 26(for each 'char') for storing stats of chars Qualified elements are put at the end of dual linked list.

Scan the input data and do all the update as necessary. Then scan the list from begin to end. The first non 'eliminated (state 3)' is your answser.

complexity : n+(26x3) where n = length(dataset)

answered Jul 09 '13 at 09:03

Galigator

8,957
2
25
39

Nothing in the question (except the example) indicates that there's only 26 possible values. – Bernhard Barker Jul 09 '13 at 09:04
1

What about chinese texts? Or arabic? Or german? – RedX Jul 09 '13 at 09:07
You are right, I said 26 because of the problem presentation and the C tag (I think we don't speak about unicode, but C chars). For an arbitrary hight number of char, you can replace this array with an hashmap. Non existing element in the map will be supposed to be in state 'never seen'. With this technics complexity will always be ( number of different char x n ). – Galigator Jul 09 '13 at 09:27
The question asks if it is possible with only 1 loop though. – Matt Jul 09 '13 at 09:58
The solution do only one loop over input data. – Galigator Jul 09 '13 at 09:59
@MrK, it doesn't say input data, it says "Is it possible to get the result only use 1 loop?" – Matt Jul 09 '13 at 10:03
Sorry I translate the "one loop" into 'n' complexity question. A one loop question is so strange. Many sub parts of programs have loop inside, malloc, freeing, etc... – Galigator Jul 09 '13 at 10:15

Sneftel · Answer 3 · 2013-07-09T09:15:41.143

2

Yes. In the hashtable, instead of maintaining counts, maintain the first index where the element was encountered. Also maintain a sorted set of all unique-so-far elements, keyed on that index. Afterwards, just look for the minimal key remaining in the sorted set.

encountered = dict()
unique = sorted_set()

for i in range(len(A)):
    elem = A[i]
    if elem in encountered:
        first_index = encountered[elem]
        del unique[first_index]
    else:
        unique[i] = elem
        encountered[elem] = i

min_index = unique.keys()[0]
first_unique_elem = A[min_index]

edited Jul 09 '13 at 09:15

answered Jul 09 '13 at 09:04

Sneftel

40,271
12
71
104

`min` is implicitly a loop. – Fred Foo Jul 09 '13 at 09:06
1

That's why a sorted set is preferable to a dict. But Python doesn't have a well-known one. Change `unique = dict()` to `unique = sorted_set()` and `min_index = min(unique.keys())` to `min_index = unique.keys()[0]` if you like. – Sneftel Jul 09 '13 at 09:14
@Ben Use a [`collections.OrderedDict`](http://docs.python.org/2/library/collections.html#collections.OrderedDict)? – David Eisenstat Jul 09 '13 at 12:15

Aravind · Answer 4 · 2013-07-09T14:03:36.840

I haven't read the other answers simply because I want to give it a go myself.
Let's iteratively improve our solution.
Our analysis in terms of time and space complexity will require us to state a few things clearly at first:
Let

N = length of string 
M = numbers of characters in alphabet

Brute force algorithm is to traverse the string and for each element of string we search to its right to see if it has a duplicate.
Time Complexity:O(N²)
Space Complexity:O(1)

Can we do better?
Sure, we can traverse the string and make count of many times a character appears.Make another traversal through the string to find the first character that has count 1.
Time Complexity:O(N+M)
Space Complexity:O(M)

Why is this O(N+M)?
Because we need to initialize the elements of the count array to 0 first.So even if input is "a", we need to initialize the count array for M elements.

Can we do better?
First let's state to the interviewer that this task is Omega(N), simply because we have to see each element of the string atleast once.Realize this by seeing a input instance of "aaaaaaz"
So we are not aiming to better our time complexity, simply making the actual running time better by doing just one traversal through the string.
This is indeed possible.

for(int i=0;i<N;i++)
{
  if(visited[i]==-2)continue;
  if(visited[i]==-1)visited[i]=i;continue;
  visited[i]=-1;
}
int F=N;
char res=' ';
for(int i=0;i<M;i++)
{
  if(visited[i]>=0)
  {
    F=min(F,visited[i]);
    res=A[visited[i]];
  }
}
return res;

Time Complexity:O(N+M)
Space Complexity: O(M)

Can we do better?

Can we do this in O(N) perhaps?
I'm still thinking of a way to do this in true O(N).IF I hit upon a solution, I will update this answer.

In fact ,your method is similar to the common method,count the number,and find out the first — liumilan, Jul 09 '13 at 14:30
I never claimed my method to be superior to any listed before.That is why I put the first line saying I haven't read the other answers before answering.Sorry if it's a repeat.Answering this helped me think over a few points. — Aravind, Jul 09 '13 at 14:50
@liumilan: I just noticed that you are the questioner, did you have a google interview? BTW, I have pointed an important fact that none of the analysis is truly O(N).People seem to overlook that often. — Aravind, Jul 09 '13 at 17:03

score 1 · Answer 5 · answered Jul 09 '13 at 12:57

Instead of a hash table you may use a trie. If input data conspire against your hash function, a hashtable will get you quadratic performance. The trie is immune to that.

As for another loop, I would not worry about it too much. It's the same complexity asymptotically. Whatever you win by eliminating the loop, you are likely to lose in increased complexity of the rest of the code.

Find the first element that occurs only once

5 Answers5