-1

This question is an extension of the following question:

Find the first un-repeated character in a string

My question is how to do this if instead of a string, we have a running stream of characters. Would the approach to use two arrays 'chars' and 'visited' as discussed in this solution work fine for this problem too?

Edit: Sample input - 3333111124 Here 2 is the first non-repeating character, assume that the query findCharacter() is made at the time when the last character seen in stream is '4'.

Note: The stream wont be stored anywhere(in contrast to the above problem where the string is already available in memory), we will only have access to the last seen character of the stream.

Community
  • 1
  • 1
pankaj
  • 1,316
  • 3
  • 16
  • 27
  • 1
    Please put all relevant parts into the question, so it is self-sufficient – Kromster Jul 10 '14 at 09:32
  • 2
    @KromStern: All relevant parts are already in *the title!* – j_random_hacker Jul 10 '14 at 10:12
  • Updated the question to make it more clear. – pankaj Jul 10 '14 at 10:39
  • There is no need to link to another question if it can be copied right here - it is not that long or anything. Having this link in question and referring to it ("My question is how to do **this** if instead of a string") just adds noise and unneeded opening of that link just to see the exact same text repeated. – Kromster Jul 10 '14 at 14:39

5 Answers5

3

The idea is to use a DLL (Doubly Linked List) to efficiently get the first non-repeating character from a stream. The DLL contains all non-repeating characters in order, i.e., the head of DLL contains first non-repeating character, the second node contains the second non-repeating and so on.

We also maintain two arrays: one array is to maintain characters that are already visited two or more times, we call it repeated[], the other array is array of pointers to linked list nodes, we call it inDLL[]. The size of both arrays is equal to alphabet size which is typically 256.

1) Create an empty DLL. Also create two arrays inDLL[] and repeated[] of size 256. 
   inDLL is an array of pointers to DLL nodes. repeated[] is a boolean array, 
   repeated[x] is true if x is repeated two or more times, otherwise false. 
   inDLL[x] contains pointer to a DLL node if character x is present in DLL, 
   otherwise NULL. 

2) Initialize all entries of inDLL[] as NULL and repeated[] as false.

3) To get the first non-repeating character, return character at head of DLL.

4) Following are steps to process a new character 'x' in stream.
  a) If repeated[x] is true, ignore this character (x is already repeated two
      or more times in the stream) 
  b) If repeated[x] is false and inDLL[x] is NULL (x is seen first time)
     Append x to DLL and store address of new DLL node in inDLL[x].
  c) If repeated[x] is false and inDLL[x] is not NULL (x is seen second time)
     Get DLL node of x using inDLL[x] and remove the node. Also, mark inDLL[x] 
     as NULL and repeated[x] as true.

Note that appending a new node to DLL is O(1) operation if we maintain tail pointer. Removing a node from DLL is also O(1). So both operations, addition of new character and finding first non-repeating character take O(1) time.

Below is the code in C++

// A C++ program to find first non-repeating character from a stream of characters
#include <iostream>
#define MAX_CHAR 256
using namespace std;

// A linked list node
struct node
{
    char a;
    struct node *next, *prev;
};

// A utility function to append a character x at the end of DLL.
// Note that the function may change head and tail pointers, that
// is why pointers to these pointers are passed.
void appendNode(struct node **head_ref, struct node **tail_ref, char x)
{
    struct node *temp = new node;
    temp->a = x;
    temp->prev = temp->next = NULL;

    if (*head_ref == NULL)
    {
        *head_ref = *tail_ref = temp;
        return;
    }
    (*tail_ref)->next = temp;
    temp->prev = *tail_ref;
    *tail_ref = temp;
}

// A utility function to remove a node 'temp' fromt DLL. Note that the
// function may change head and tail pointers, that is why pointers to
// these pointers are passed.
void removeNode(struct node **head_ref, struct node **tail_ref,
                struct node *temp)
{
    if (*head_ref == NULL)
        return;

    if (*head_ref == temp)
        *head_ref = (*head_ref)->next;
    if (*tail_ref == temp)
        *tail_ref = (*tail_ref)->prev;
    if (temp->next != NULL)
        temp->next->prev = temp->prev;
    if (temp->prev != NULL)
        temp->prev->next = temp->next;

    delete(temp);
}

void findFirstNonRepeating()
{
    // inDLL[x] contains pointer to a DLL node if x is present in DLL.
    // If x is not present, then inDLL[x] is NULL
    struct node *inDLL[MAX_CHAR];

    // repeated[x] is true if x is repeated two or more times. If x is
    // not seen so far or x is seen only once. then repeated[x] is false
    bool repeated[MAX_CHAR];

    // Initialize the above two arrays
    struct node *head = NULL, *tail = NULL;
    for (int i = 0; i < MAX_CHAR; i++)
    {
        inDLL[i] = NULL;
        repeated[i] = false;
    }

    // Let us consider following stream and see the process
    char stream[] = "3333111124";
    for (int i = 0; stream[i]; i++)
    {
        char x = stream[i];
        cout << "Reading " << x << " from stream \n";

        // We process this character only if it has not occurred or occurred
        // only once. repeated[x] is true if x is repeated twice or more.s
        if (!repeated[x])
        {
            // If the character is not in DLL, then add this at the end of DLL.
            if (inDLL[x] == NULL)
            {
                appendNode(&head, &tail, stream[i]);
                inDLL[x] = tail;
            }
            else // Otherwise remove this caharacter from DLL
            {
                removeNode(&head, &tail, inDLL[x]);
                inDLL[x] = NULL;
                repeated[x] = true;  // Also mark it as repeated
            }
        }

        // Print the current first non-repeating character from stream
        if (head != NULL)
            cout << "First non-repeating character so far is " << head->a << endl;
    }
}

/* Driver program to test above function */
int main()
{
    findFirstNonRepeating();
    return 0;
}

Output will be like:

Reading 3 from stream 
First non-repeating character so far is 3
Reading 3 from stream 
Reading 3 from stream 
Reading 3 from stream 
Reading 1 from stream 
First non-repeating character so far is 1
Reading 1 from stream 
Reading 1 from stream 
Reading 1 from stream 
Reading 2 from stream 
First non-repeating character so far is 2
Reading 4 from stream 
First non-repeating character so far is 2
Ayush
  • 2,608
  • 2
  • 22
  • 31
  • This works, Thanks!! But I still have a doubt, if you have seen the answer that i mentioned in my question above, he is using a queue and an array to do the job, IMO you are also using a DLL to implement some kind of priority queue and a book-keeping array. Is my understanding correct here? – pankaj Jul 10 '14 at 10:50
  • O(1) time complexity :) – Ayush Jul 10 '14 at 10:51
  • You could use a fixed array instead of the DLL. At the end of the processing you have to scan it to find the first non-deleted entry, but the number of scan iterations you do here is less than the number of `new` and `delete` calls you did with the DLL, so this must be faster – M.M Jul 10 '14 at 11:21
  • @pankaj I have not seen the solution of the link you pasted but yes I am using a book-keeping array to reject repeating elements and a priority queue(which is FIFO) of the non-repeating characters of the stream and yes you are welcome. :) – Ayush Jul 10 '14 at 16:20
0

try like below using hashmap in o(n)

int main()

{

int a[256]={0};

char *b="Helloworldd";

int i;

for(i=0;i<strlen(b);++i)

a[b[i]]++;

for(i=0;i<strlen(b);++i)

if(a[b[i]]>;1)

{

        printf("First Repeating %c\n",b[i]);

        break;
}

for(i=strlen(b)-1;i;--i)

if(a[b[i]]>;1)

{

        printf("Last Repeating %c\n",b[i]);

        break;
}

}

Anand
  • 621
  • 3
  • 9
  • 31
0

You will not require two arrays here. Full code upon my understanding of the question

int main(){
    bool * visited = new bool[256];
    memset(visited,false,256); //initialize to false
    char character;
    char answer = 'a'; #random initialization
    while(1){
        cin>>character;
        if (!visited['character']) answer = character; //first non-repeating character at that instance
        visited['character'] = True;
        //Now embed your find character query according to your need
        //For example:
        if ('character' == '4') {
                cout<<answer<<endl;
                return 0;
         }
    }
    return 0;
}
gmfreak
  • 399
  • 4
  • 12
0

Here is the Java implementation using BitSet (instead of Boolean array) and rxJava

import java.util.BitSet;
import rx.Observable;
import rx.functions.Action1;

public class CharacterStream {

private static int  MAX_CHAR =  256;

private Node<Character> head;
private Node<Character> tail;

private BitSet repeated;

private Node<Character>[] inDLL;

@SuppressWarnings("unchecked")
public CharacterStream(final Observable<Character> inputStream) {

    repeated = new BitSet(MAX_CHAR);
    inDLL = new Node[MAX_CHAR];
    for (int i = 0; i < MAX_CHAR; i++) {
        inDLL[i] = null;
    }

    inputStream.subscribe(new Action1<Character>() {

        @Override
        public void call(Character incomingCharacter) {
            System.out.println("Received -> " + incomingCharacter);
            processStream(incomingCharacter);   
        }
    });
}

public Character firstNonRepeating() {
    if (head == null) {
        return null;
    }
    return head.item;
}

private void processStream(Character chr) {
    int charValue = (int) chr.charValue();
    if (!repeated.get(charValue)) {

        if (inDLL[charValue] == null) {
            appendToTail(chr);
            inDLL[charValue] = tail;
        } else {
            removeNode(inDLL[charValue]);
            inDLL[charValue] = null;
            repeated.set(charValue);
        }
    }
}

private void removeNode(Node<Character> node) {
    if (head == null) {
        return ;
    } 

    if (head == node) {
        head = head.next;
        //head.prev = null;
    } 
    if (tail == node) {
        tail = tail.prev;
        //tail.next = null;
    }

    if (node.next != null) {
        node.next.prev= node.prev;
    }
    if (node.prev != null) {
        node.prev.next = node.next;
    }
}

private void appendToTail(Character chr) {
    Node<Character> temp = new Node<Character>(chr);
    if (head == null) {
        head = temp;
        tail = temp;
    }
    tail.next = temp;
    temp.prev = tail;
    tail = temp;
}

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;
    public Node(E val) {
        this.item = val;
    }
}

}

Here is the unit test cases

@Test
public void firstNonRepeatingCharInAStreamTest() {

    Observable<Character> observable = Observable.create(new OnSubscribe<Character>() {

        @Override
        public void call(Subscriber<? super Character> subscriber) {
            subscriber.onNext('N');
            subscriber.onNext('Z');
            subscriber.onNext('B');
            subscriber.onNext('C');
            subscriber.onNext('D');
            subscriber.onNext('A');
            subscriber.onNext('C');
            subscriber.onNext('B');
            subscriber.onNext('A');
            subscriber.onNext('N'); 
            //subscriber.onNext('Z');   
        }           
    });

    CharacterStream charStream = new CharacterStream(observable);
    assertThat(charStream.firstNonRepeating(), equalTo('Z'));
  }
craftsmannadeem
  • 2,665
  • 26
  • 22
0
import java.util.*;

class uniqueElem{
    public static HashSet<Integer> hs = new HashSet<Integer>();

    public static void checkElem(){
        Scanner sc = new Scanner(System.in);
        boolean flag = true;
        while(flag == true){
            int no = sc.nextInt();
            if(no == -1) break;

            boolean check = hs.add(no);
            if(check == true){
                System.out.println("Occur first time: "+no);
            }
        }
    }
    public static void main(String args[]){
        checkElem();
    }
}
Arsen Davtyan
  • 1,891
  • 8
  • 23
  • 40