2

I recently asked this question and the answers increased my understanding, but they didn't solve the actual problem I had. So, I will try to ask a similar but different question as follows.

Suppose that I want to access random rune element of a string. One way is:

func RuneElement(str string, idx int) rune {
  var ret rune
  for i, c := range str {
    if i == idx {
      return c
    }
  }
  return ret // out of range -> proper handling is needed
}

What if I want to call such a function a lot of times? I guess what I am looking for is like an operator/function like str[i] (which returns a byte) that return the rune element at i-th position. Why this element can be accessed using for ... range but not through a funtcion like str.At(i) for example?

icza
  • 389,944
  • 63
  • 907
  • 827
Eissa N.
  • 1,695
  • 11
  • 18
  • 1
    If you don't want to convert a `string` to `[]rune` in every call, you need to use `[]rune` – JimB Jun 13 '17 at 16:45
  • @JimB But, my input is a string and I try to avoid conversion of `string` to `[]rune` – Eissa N. Jun 13 '17 at 16:51
  • My point is that you need to convert a `string` to a `[]rune` in order to index it as such. If you don't want to repeatedly convert the `string`, then use a `[]rune` as the argument type, and convert it once. – JimB Jun 13 '17 at 17:03
  • Or are you simply looking for: https://play.golang.org/p/RdH7oMCHIZ? – JimB Jun 13 '17 at 17:08
  • @JimB Yes, that is what I am looking for, but without conversion from `string` to `[]rune`. It seems that it's not possible though because of the way `string` is designed as @icza mentions [here](https://stackoverflow.com/a/44527543/2229960) – Eissa N. Jun 13 '17 at 17:12

1 Answers1

4

string values in Go store the UTF-8 encoded byte sequence of the text. This is a design decision that has been made and it won't change.

If you want to efficiently get a rune from it at an arbitrary index, you have to decode the bytes, you can't do anything about that (the for ... range does this decoding). There is no "shortcut". The chosen representation just doesn't provide this out of the box.

If you have to do this frequently / many times, you should change your input and not use string but a []rune, as it's a slice and can be efficiently indexed. string in Go is not []rune. string in Go is effectively a read-only []byte (UTF-8). Period.

If you can't change the input type, you may build an internal cache mapped from string to its []rune:

var cache = map[string][]rune{}

func RuneAt(s string, idx int) rune {
    rs := cache[s]
    if rs == nil {
        rs = []rune(s)
        cache[s] = []rune(s)
    }
    if idx >= len(rs) {
        return 0
    }
    return rs[idx]
}

It depends on case whether this is worth it: if RuneAt() is called with a small set of strings, this may improve performance a lot. If the passed strings are more-or-less unique, this will result in worse performance and a lot of memory usage. Also this implementation is not safe for concurrent use.

icza
  • 389,944
  • 63
  • 907
  • 827