-5

First of all i know how find first non repeating character if the string contain Ascii table, like: `"abccba.."

the question or the problem is: how can find first non repeating character from string/buffer contain mix letters? i mean we don`t know what the language is! maybe is English or Arabic or is mix between two language, and i must do that in O(n).

  • if we used HashMap then get and put cost O(1) [PROVE]?
  • what the kind of input! is string or another container?
  • 2
    how is having "mix letters" a problem? – Joni Sep 15 '16 at 10:17
  • 2
    Actually, mixed letters could easily give extra problems. Do you need to use Unicode normalisation? How do you handle compositions of diacritics? What exactly *is* a grapheme? – Phylogenesis Sep 15 '16 at 10:23
  • Possible duplicate: http://stackoverflow.com/questions/30711377/finding-first-non-repeating-number-in-integer-array?rq=1. Though the answer given there requires there is only one non-repeating element. I assume the “find first” in this question implies there may be more than one. – Ole V.V. Sep 15 '16 at 14:09
  • Regarding the point made by @Phylogenesis, you might just want to recast this as "Find first non-repeating UTF-16 code unit." (I never understand why people want to explore non-textual algorithms using text for data.) – Tom Blodget Sep 16 '16 at 01:02

3 Answers3

0

Make a <Character, Integer> map counting the number of occurrences, go through your whole String by character at a time and add it to map if it's not already there (with count of 1), or increment count of occurences if it already is there. Then go through your whole String again, and return the first item you find that has count 1. Counting number of occurrences is not necessary here, could be done in a different way or stopped at 2, but it may be useful if you are trying to expand your code later to do more. Of course you need to take care of potential unicode problems or something if you are using arabic letters, but I think that's not what you're asking about. This solution is O(N), you are going through your String 2 times, and map operations are cheaper.

Shadov
  • 5,421
  • 2
  • 19
  • 38
  • Make sure to instantiate your `HashMap` with a capacity of, say, 4 * (string.length() + 1) / 3 to avoid extension and rehashing. – Ole V.V. Sep 15 '16 at 10:41
  • Does this fulfil the O(n) requirement? “Big O specifically describes the worst-case scenario” (quoted from https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/). In the worst case you have lots of hash collisions, so how could your hash map lookups still be in O(1)?? – Ole V.V. Sep 15 '16 at 10:56
  • Algorithm itself is `O(N)`, you are right with hash collisions, but it's programmer's task to take care of that and optimize the hashes so it's optimal - in my opinion of course. – Shadov Sep 15 '16 at 11:01
  • If we don’t know what characters will be in the string, I don’t think we can rely on `Character.hashCode()` being good enough, nor on writing a better hash function. Just for being completely strict and maybe taking the role of the devil’s advocate. :-) – Ole V.V. Sep 15 '16 at 11:06
0

In case the HashMap solution is not good enough, here’s what I can think of: Make two instances of java.util.BitSet, seenOnce and seenTwice, each with 0x10000 bits. For each char, if it is already in seenOnce, add it to seenTwice, otherwise add it to seenOnce. Second time throgh the string, print out the first character not in seenTwice.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
0

In C#... O(n) time complexity...

public char GetFirstRecuringChar(string text)
{
    if (string.IsNullOrEmpty(text) || string.IsNullOrWhiteSpace(text))
    {
        throw new ArgumentNullException();
    }

    IDictionary<char, CharIndex> map = new Dictionary<char, CharIndex>();

    for (int i = 0; i < text.Length; i++)
    {
        if (map.ContainsKey(text[i]))
        {
            map[text[i]].Count++;
        }
        else
        {
            CharIndex ci = new CharIndex { Index = i, Ch = text[i], Count = 1 };
            map.Add(text[i], ci);
        }
    }

    int lowestIndex = int.MaxValue;
    foreach (CharIndex charIndex in map.Values)
    {
        if (charIndex.Count == 1)
        {
            if (lowestIndex > charIndex.Index)
            {
                lowestIndex = charIndex.Index;
            }
        }
    }

    char answer = '\n';
    if (lowestIndex != int.MaxValue)
    {
        foreach (CharIndex charIndex in map.Values)
        {
            if (charIndex.Index == lowestIndex)
            {
                answer = charIndex.Ch;
                break;
            }
        }
    }

    return answer;
}

private class CharIndex
{
    public char Ch { get; set; }
    public int Index { get; set; }
    public int Count { get; set; }
}

Unit tests:

[TestCase("ississippitotalm", 'o')]
[TestCase("a", 'a')]
[TestCase("teeter", 'r')]
[TestCase("teeterxyz", 'r')]
[TestCase(".......................................x.................f.........y...xkiuytreeee", 'f')]
public void GetFirstRecuringCharTest(string text, char expectedAnswer)
{
    char result = runner.GetFirstRecuringChar(text);
    Assert.That(result, Is.EqualTo(expectedAnswer));
}
Yawar Murtaza
  • 3,655
  • 5
  • 34
  • 40