40

I wonder how I can I get a Unicode character from a string. For example, if the string is "你好", how can I get the first character "你"?

From another place I get one way:

var str = "你好"
runes := []rune(str)
fmt.Println(string(runes[0]))

It does work. But I still have some questions:

  1. Is there another way to do it?

  2. Why in Go does str[0] not get a Unicode character from a string, but it gets byte data?

Dave C
  • 7,729
  • 4
  • 49
  • 65
赵浩翔
  • 643
  • 1
  • 5
  • 13

3 Answers3

44

First, you may want to read https://blog.golang.org/strings It will answer part of your questions.

A string in Go can contains arbitrary bytes. When you write str[i], the result is a byte, and the index is always a number of bytes.

Most of the time, strings are encoded in UTF-8 though. You have multiple ways to deal with UTF-8 encoding in a string.

For instance, you can use the for...range statement to iterate on a string rune by rune.

var first rune
for _,c := range str {
    first = c
    break
}
// first now contains the first rune of the string

You can also leverage the unicode/utf8 package. For instance:

r, size := utf8.DecodeRuneInString(str)
// r contains the first rune of the string
// size is the size of the rune in bytes

If the string is encoded in UTF-8, there is no direct way to access the nth rune of the string, because the size of the runes (in bytes) is not constant. If you need this feature, you can easily write your own helper function to do it (with for...range, or with the unicode/utf8 package).

Didier Spezia
  • 70,911
  • 12
  • 189
  • 154
  • Thanks for you help.The second way only can get the first unicode character,it seems imperfect.I understood the first way,and I think I can modify it to solve my problem.And I still wonder whether there is a easie rway to get unicode charcter by index from a string. – 赵浩翔 May 15 '15 at 16:18
  • 1
    Correct, but I suspect that in most cases it really isn't any kind of performance bottleneck, and it'd be easy to optimize afterwards when you've actually profiled your code and deemed it necessary. Of course, there are situations where's it's *obviously* the wrong way to do things. – LemurFromTheId May 15 '15 at 17:28
  • So it seems to me that for a given go-lang string, there is not way to retrieve a rune directly by index (by which I mean taking O(1) time), but we have to either use `for range` or convert it to `[]rune` first, both of which take O(n) time. Is that correct? – ibic May 13 '18 at 03:10
  • 2
    I would suggest if you need index into the runes of a string many times in your program, convert once to []rune in O(n) and then you can index as many times as you want in O(1) time. It's likely that the string in question will have at least one O(n) operation performed on it anyway at some point (even if it's just initial assignment), so adding another will probably not impact the overall asymptotic run-time of your program. – Jason Carlson Oct 06 '18 at 12:30
2

You can use the utf8string package:

package main
import "golang.org/x/exp/utf8string"

func main() {
   s := utf8string.NewString("ÄÅàâäåçèéêëìîïü")
   // example 1
   r := s.At(1)
   println(r == 'Å')
   // example 2
   t := s.Slice(1, 3)
   println(t == "Åà")
}

https://pkg.go.dev/golang.org/x/exp/utf8string

Zombo
  • 1
  • 62
  • 391
  • 407
-2

you can do this:

func main() {
  str := "cat"
  var s rune
  for i, c := range str {
    if i == 2 {
      s = c
    }
  }
}

s is now equal to a

devin
  • 1