21

If you run fmt.Println("\u554a"), it shows '啊'.

But how to get unicode-style-string \u554a from a rune '啊' ?

hardPass
  • 19,033
  • 19
  • 40
  • 42
  • why do you want to do this? – newacct May 22 '13 at 22:10
  • 1
    It is very common to use `\uXXXX`-style instead none-ASCII-char like '世界' in json data. Please try this json data `{"one": "\u554a ", "two": "啊"}` by `jquery.getJSON()` . And on the page ,you can find one is ok,but two shows messy code. – hardPass May 23 '13 at 05:13
  • It is not common. You are mistaken about JSON. JSON consists of a sequence of characters. Any non-ASCII character can be used in JSON. You are probably using JSON wrong. – newacct May 23 '13 at 08:54
  • 3
    It is right that non-ASCII character can be used in JSON. But you should know, not all biz-system deal with utf-8 encoding. How do you deal with different data in different encoding from different system? May be it is not common to you. I guess you have better idea. – hardPass May 24 '13 at 16:00
  • whenever you transfer text, both sides need to know exactly what encoding is being used – newacct May 24 '13 at 18:13
  • @newacct Let me share a case from a friend. It's an AD provider, which provides AD-json(utf-8) to a group of s-agents, and then s-agents pass the AD to final-end websites. The whole chains work well, unless a few of s-agents pass AD-json in various encodings, which cause garbled display in final-end pages. They don't very care about what cause the encoding changed through s-agent systems, may be some legacy requirement or just the bugs. What they really care about is to fixed it effectively. So they convert runes to unicode-style in root system, and all pages display well now. – hardPass May 27 '13 at 06:43
  • Common or not is not very important. It's possible there're ways better than this convert-thing. And in my business EDI, problems like this often show up. And this unicode-style converting is kind of best practice thing now . Ofcourse, I was asking clients why not use utf-8 as a standard. – hardPass May 27 '13 at 06:45
  • If you have data and don't know what encoding it is, it is completely useless. You cannot use it at all. You may be thinking that all encodings are ASCII-compatible. But most of the encodings in the world are not ASCII-compatible. – newacct May 27 '13 at 06:46
  • I actually want to do it in the opposite way, which is to get '啊' from '\u554a' and print it out. I just wonder whether this is feasible. – 赣西狠人 Mar 13 '21 at 15:13

8 Answers8

18
package main

import "fmt"
import "strconv"

func main() {
    quoted := strconv.QuoteRuneToASCII('啊') // quoted = "'\u554a'"
    unquoted := quoted[1:len(quoted)-1]      // unquoted = "\u554a"
    fmt.Println(unquoted)
}

This outputs:

\u554a
Darshan Rivka Whittle
  • 32,989
  • 7
  • 91
  • 109
  • There is a quoted version for `RuneToASCII` in official package, but why not give us the direct func without quote. I am afraid this is not neat enough, beacause of dealing with the quote. So I just give a func `RuneToASCII` above. It seems more efficient. – hardPass May 23 '13 at 02:45
  • @hardPass I slightly prefer my way, but I like yours too, and I can see why you prefer it. It's your question, feel free to mark your own as the selected answer. – Darshan Rivka Whittle May 23 '13 at 04:21
  • Is there another function to go the other way? `\u554a` -> 啊? – 425nesp Dec 07 '15 at 05:39
13

IMHO, it should be better:

func RuneToAscii(r rune) string {
    if r < 128 {
        return string(r)
    } else {
        return "\\u" + strconv.FormatInt(int64(r), 16)
    }
}
hardPass
  • 19,033
  • 19
  • 40
  • 42
4

You can use fmt.Sprintf along with %U to get the hexadecimal value:

test = fmt.Sprintf("%U", '啊')
fmt.Println("\\u" + test[2:]) // Print \u554A
laurent
  • 88,262
  • 77
  • 290
  • 428
  • You're right. It should be ok by just converting it to hexadecimal value. And this func should be more efficient: `func RuneToAscii(r rune) string`, which is above shown – hardPass May 23 '13 at 02:34
  • Sprintf(), like all of the printf() functions, use reflection to determine the argument types. Reflection is generally an expensive operation compared to a special purpose function like RuneToAscii() or QuoteRuneToASCII() that already knows the data types. Yes, we are talking milliseconds or less here, but if you are doing this in a loop of 10s of thousands, those milliseconds add up. Just my two cents. – Ronald Currier Jun 09 '20 at 17:07
1

For example,

package main

import "fmt"

func main() {
    r := rune('啊')
    u := fmt.Sprintf("%U", r)
    fmt.Println(string(r), u)
}

Output:

啊 U+554A
peterSO
  • 158,998
  • 31
  • 281
  • 276
  • 2
    It is common to use `\u554A` in json, not `U+554A`. To get `\u554A`, you still need do some extra operations. It is not neat enough. – hardPass May 23 '13 at 02:51
1
fmt.Printf("\\u%X", '啊')

http://play.golang.org/p/Jh9ns8Qh15

(Upper or lowercase 'x' will control the case of the hex characters)

As hinted at by package fmt's documentation:

%U Unicode format: U+1234; same as "U+%04X"

Dijkstra
  • 2,490
  • 3
  • 21
  • 35
  • When input is already a assii, I'd like it to be itself. Like 'a' inputed, return 'a' instead of '\u61'. I should've mention this earlier. It seems that you guys do not take this as a common requirement. I was really wordering about this. Ah, but it's ok that I've got a func could do my work. – hardPass May 27 '13 at 05:44
  • This should be the chosen answer -- No need `strconv` package at all. – xpt Sep 07 '17 at 21:32
1
package main

import "fmt"

func main() {
    fmt.Printf("%+q", '啊')
}
m1kael
  • 2,801
  • 1
  • 15
  • 14
0

I'd like to add to the answer that hardPass has.

In the case where the hex representation of the unicode is less that 4 characters (ü for example) strconv.FormatInt will result in \ufc which will result in a unicode syntax error in Go. As opposed to the full \u00fc that Go understands.

Padding the hex with zeros using fmt.Sprintf with hex formatting will fix this:

func RuneToAscii(r rune) string {
    if r < 128 {
        return string(r)
    } else {
        return fmt.Sprintf("\\u%04x", r)
    }
}

https://play.golang.org/p/80w29oeBec1

mowzy
  • 105
  • 1
  • 8
0

This would do the job..

package main

import (
    "fmt"
)

func main() {
    str := fmt.Sprintf("%s", []byte{0x80})
    fmt.Println(str)
}
Ritwik
  • 1,597
  • 16
  • 17