2

I'm trying to write a function that returns the finds first character in a String that doesn't repeat, so far I have this:

package main

import (
    "fmt"
    "strings"
)

func check(s string) string {

    ss := strings.Split(s, "")
    smap := map[string]int{}

    for i := 0; i < len(ss); i++ {
        (smap[ss[i]])++

    }

    for k, v := range smap {

        if v == 1 {
            return k
        }
    }

    return ""
}

func main() {
    fmt.Println(check("nebuchadnezzer"))

}

Unfortunately in Go when you iterate a map there's no guarantee of the order so every time I run the code I get a different value, any pointers?

jwesonga
  • 4,249
  • 16
  • 56
  • 83
  • You didn't define what order you wanted. Input order? Codepoint order? And are you treating the source as an 8-bit bytestring, or a unicode string? – krait May 25 '14 at 16:45
  • @Krait, is there really a point for not treating a string as a unicode string? he would be using []byte if he didn't care about unicode I'd assume. – OneOfOne May 25 '14 at 17:13
  • @OneOfOne strings in Go are *conventionally* encoded as utf8, but they can also be treated as any arbitrary bytestring. There's no community emphasis that []byte be used exclusively when a "string" is known to not be utf8. – krait May 25 '14 at 17:17

3 Answers3

6

Using a map and 2 loops : play

func check(s string) string {
    m := make(map[rune]uint, len(s)) //preallocate the map size
    for _, r := range s {
        m[r]++
    }

    for _, r := range s {
        if m[r] == 1 {
            return string(r)
        }
    }
    return ""
}

The benfit of this is using just 2 loops vs multiple loops if you're using strings.ContainsRune, strings.IndexRune (each function will have inner loops in them).

OneOfOne
  • 95,033
  • 20
  • 184
  • 185
  • Use of uint16 is incorrect. Your algorithm would produce incorrect output on a string containing (1 + any multiple of 65536) duplicates of any rune. Use int instead of uint16, as no Go string can be more than "max int" bytes long, which implies that it cannot be more than that many runes long as well. – krait May 25 '14 at 17:20
  • Good catch, switching it to uint. – OneOfOne May 25 '14 at 17:22
  • +1. That seems certainly more optimized than my answer. – VonC May 25 '14 at 17:27
  • OneofOne I selected your answer since it seemed to have less moving parts, and eliminated the inner loops contained in the strings functions. Is there a way to optimize the function and have one loop? – jwesonga May 25 '14 at 18:03
  • As far as I can tell, no, you have to have some kind of a 2nd loop, either that way or more hidden using `strings.Contains`, etc. – OneOfOne May 25 '14 at 20:54
2

Efficient (in time and memory) algorithms for grabbing all or the first unique byte http://play.golang.org/p/ZGFepvEXFT:

func FirstUniqueByte(s string) (b byte, ok bool) {
    occur := [256]byte{}
    order := make([]byte, 0, 256)
    for i := 0; i < len(s); i++ {
        b = s[i]
        switch occur[b] {
        case 0:
            occur[b] = 1
            order = append(order, b)
        case 1:
            occur[b] = 2
        }
    }
    for _, b = range order {
        if occur[b] == 1 {
            return b, true
        }
    }
    return 0, false
}

As a bonus, the above function should never generate any garbage. Note that I changed your function signature to be a more idiomatic way to express what you're describing. If you need a func(string) string signature anyway, then the point is moot.

krait
  • 785
  • 4
  • 11
  • +1. More optimized than my answer, and more idiomatic as well. – VonC May 25 '14 at 17:29
  • But mine specifically won't handle unicode, so optimization is moot if unicode is needed. My order variable is also a micro-optimization which won't really help with typical inputs, such as "kitten", but would skip looking at, e.g. 5000 bytes in the second loop when the input is 5001 a's followed by a single b. – krait May 25 '14 at 17:41
  • @krait: -1. You only handle ASCII characters (<= 0x7F). – peterSO May 25 '14 at 17:59
  • @peterSO No, I handle opaque bytes (<= 0xFF). I'm not sure where you got ASCII from. The OP didn't specify, and in some situations is perfectly legitimate to treat strings as a sequence of bytes (in which case unicode handling will produce an incorrect result); this is a solution for such cases. For when unicode handling is needed, OneOfOne has a good solution. – krait May 25 '14 at 18:16
0

That can certainly be optimized, but one solution (which isn't using map) would be:
(playground example)

func check(s string) string {
    unique := ""
    for pos, c := range s {
        if strings.ContainsRune(unique, c) {
            unique = strings.Replace(unique, string(c), "", -1)
        } else if strings.IndexRune(s, c) == pos {
            unique = unique + string(c)
        }
    }
    fmt.Println("All unique characters found: ", unique)
    if len(unique) > 0 {
        _, size := utf8.DecodeRuneInString(unique)
        return unique[:size]
    }
    return ""
}

This is after the question "Find the first un-repeated character in a string"

krait suggested below that the function should:

return a string containing the first full rune, not just the first byte of the utf8 encoding of the first rune.

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • `rune -> string` and `byte -> string` conversion is disfavored, although it's stuck in the language because of the Go 1 guarantee. Also, you're not actually producing a string containing the first unique rune; you're producing a string of the first utf8 byte of the first unique utf8-byte-prefix. – krait May 25 '14 at 17:31
  • By using string concatenation as you are, you're producing one piece of garbage per input rune, and the algorithm is O(n^2), rather than the optimal O(n). Also, fmt.Println inserts spaces for you. – krait May 25 '14 at 17:32
  • @krait Sure, I was hoping someone with more go-knowledge was coming along to illustrate a correct algorithm. – VonC May 25 '14 at 17:42
  • aside from the unique[0] part, your algorithm is mostly correct. The rest is just optimization. – krait May 25 '14 at 17:48
  • @krait sorry for the rejected edit (http://stackoverflow.com/review/suggested-edits/4900866) by mindless edit-reviewers robots ;) I have included your edit manually, and modified the playground example accordingly. – VonC May 25 '14 at 17:55
  • Hah, no problem. I was originally putting it in a comment, but it was hard to contextually describe. – krait May 25 '14 at 21:54