How to convert a rune to unicode-style-string like `\u554a` in Golang?

Question

If you run fmt.Println("\u554a"), it shows '啊'.

But how to get unicode-style-string \u554a from a rune '啊' ?

It is very common to use `\uXXXX`-style instead none-ASCII-char like '世界' in json data. Please try this json data `{"one": "\u554a ", "two": "啊"}` by `jquery.getJSON()` . And on the page ,you can find one is ok,but two shows messy code. — hardPass, May 23 '13 at 05:13
It is not common. You are mistaken about JSON. JSON consists of a sequence of characters. Any non-ASCII character can be used in JSON. You are probably using JSON wrong. — newacct, May 23 '13 at 08:54
It is right that non-ASCII character can be used in JSON. But you should know, not all biz-system deal with utf-8 encoding. How do you deal with different data in different encoding from different system? May be it is not common to you. I guess you have better idea. — hardPass, May 24 '13 at 16:00
whenever you transfer text, both sides need to know exactly what encoding is being used — newacct, May 24 '13 at 18:13
@newacct Let me share a case from a friend. It's an AD provider, which provides AD-json(utf-8) to a group of s-agents, and then s-agents pass the AD to final-end websites. The whole chains work well, unless a few of s-agents pass AD-json in various encodings, which cause garbled display in final-end pages. They don't very care about what cause the encoding changed through s-agent systems, may be some legacy requirement or just the bugs. What they really care about is to fixed it effectively. So they convert runes to unicode-style in root system, and all pages display well now. — hardPass, May 27 '13 at 06:43
Common or not is not very important. It's possible there're ways better than this convert-thing. And in my business EDI, problems like this often show up. And this unicode-style converting is kind of best practice thing now . Ofcourse, I was asking clients why not use utf-8 as a standard. — hardPass, May 27 '13 at 06:45
If you have data and don't know what encoding it is, it is completely useless. You cannot use it at all. You may be thinking that all encodings are ASCII-compatible. But most of the encodings in the world are not ASCII-compatible. — newacct, May 27 '13 at 06:46
I actually want to do it in the opposite way, which is to get '啊' from '\u554a' and print it out. I just wonder whether this is feasible. — 赣西狠人, Mar 13 '21 at 15:13

score 18 · Answer 1 · answered May 22 '13 at 03:19

18

package main

import "fmt"
import "strconv"

func main() {
    quoted := strconv.QuoteRuneToASCII('啊') // quoted = "'\u554a'"
    unquoted := quoted[1:len(quoted)-1]      // unquoted = "\u554a"
    fmt.Println(unquoted)
}

This outputs:

\u554a

answered May 22 '13 at 03:19

Darshan Rivka Whittle

32,989
7
91
109

There is a quoted version for `RuneToASCII` in official package, but why not give us the direct func without quote. I am afraid this is not neat enough, beacause of dealing with the quote. So I just give a func `RuneToASCII` above. It seems more efficient. – hardPass May 23 '13 at 02:45
@hardPass I slightly prefer my way, but I like yours too, and I can see why you prefer it. It's your question, feel free to mark your own as the selected answer. – Darshan Rivka Whittle May 23 '13 at 04:21
Is there another function to go the other way? `\u554a` -> 啊? – 425nesp Dec 07 '15 at 05:39

score 13 · Accepted Answer · answered May 22 '13 at 05:49

13

IMHO, it should be better:

func RuneToAscii(r rune) string {
    if r < 128 {
        return string(r)
    } else {
        return "\\u" + strconv.FormatInt(int64(r), 16)
    }
}

answered May 22 '13 at 05:49

hardPass

19,033
19
40
42

score 4 · Answer 3 · answered May 22 '13 at 03:22

4

You can use fmt.Sprintf along with %U to get the hexadecimal value:

test = fmt.Sprintf("%U", '啊')
fmt.Println("\\u" + test[2:]) // Print \u554A

answered May 22 '13 at 03:22

laurent

88,262
77
290
428

You're right. It should be ok by just converting it to hexadecimal value. And this func should be more efficient: `func RuneToAscii(r rune) string`, which is above shown – hardPass May 23 '13 at 02:34
Sprintf(), like all of the printf() functions, use reflection to determine the argument types. Reflection is generally an expensive operation compared to a special purpose function like RuneToAscii() or QuoteRuneToASCII() that already knows the data types. Yes, we are talking milliseconds or less here, but if you are doing this in a loop of 10s of thousands, those milliseconds add up. Just my two cents. – Ronald Currier Jun 09 '20 at 17:07

score 1 · Answer 4 · answered May 22 '13 at 03:15

1

For example,

package main

import "fmt"

func main() {
    r := rune('啊')
    u := fmt.Sprintf("%U", r)
    fmt.Println(string(r), u)
}

Output:

啊 U+554A

answered May 22 '13 at 03:15

peterSO

158,998
31
281
276

2

It is common to use `\u554A` in json, not `U+554A`. To get `\u554A`, you still need do some extra operations. It is not neat enough. – hardPass May 23 '13 at 02:51

score 1 · Answer 5 · answered May 26 '13 at 15:01

1

fmt.Printf("\\u%X", '啊')

http://play.golang.org/p/Jh9ns8Qh15

(Upper or lowercase 'x' will control the case of the hex characters)

As hinted at by package fmt's documentation:

%U Unicode format: U+1234; same as "U+%04X"

answered May 26 '13 at 15:01

Dijkstra

2,490
3
21
35

When input is already a assii, I'd like it to be itself. Like 'a' inputed, return 'a' instead of '\u61'. I should've mention this earlier. It seems that you guys do not take this as a common requirement. I was really wordering about this. Ah, but it's ok that I've got a func could do my work. – hardPass May 27 '13 at 05:44
This should be the chosen answer -- No need `strconv` package at all. – xpt Sep 07 '17 at 21:32

score 1 · Answer 6 · answered May 03 '19 at 12:05

1

package main

import "fmt"

func main() {
    fmt.Printf("%+q", '啊')
}

answered May 03 '19 at 12:05

m1kael

2,801
1
15
14

mowzy · Answer 7 · 2019-01-02T20:54:39.257

I'd like to add to the answer that hardPass has.

In the case where the hex representation of the unicode is less that 4 characters (ü for example) strconv.FormatInt will result in \ufc which will result in a unicode syntax error in Go. As opposed to the full \u00fc that Go understands.

Padding the hex with zeros using fmt.Sprintf with hex formatting will fix this:

func RuneToAscii(r rune) string {
    if r < 128 {
        return string(r)
    } else {
        return fmt.Sprintf("\\u%04x", r)
    }
}

https://play.golang.org/p/80w29oeBec1

score 0 · Answer 8 · answered Apr 23 '19 at 00:30

0

This would do the job..

package main

import (
    "fmt"
)

func main() {
    str := fmt.Sprintf("%s", []byte{0x80})
    fmt.Println(str)
}

answered Apr 23 '19 at 00:30

Ritwik

1,597
16
17

How to convert a rune to unicode-style-string like `\u554a` in Golang?

8 Answers8

Linked