1

My database contains text encoded as "1¼" or "¼" which I want to display as "¼". The text that may contains such characters is very large.

How can we achieve this using Golang?

Prashant
  • 3,823
  • 3
  • 25
  • 40
  • 8
    Starting point for anyone attempting to answer this question: The string "¼" is the result of a UTF8/Latin-1 mixup. The bytes C2 BC are "¼" in UTF8 and "¼" in Latin-1. –  Jun 25 '15 at 14:18
  • Might be useful: [GoLang - Persist using ISO-8859-1 charset](http://stackoverflow.com/questions/24555819/golang-persist-using-iso-8859-1-charset) and [Reading a non UTF-8 text file in go](http://stackoverflow.com/questions/10277933/reading-a-non-utf-8-text-file-in-go) – icza Jun 26 '15 at 06:15

1 Answers1

2

As Wumpus points out, it looks like an encoding mixup. One easy way to fix it is to force-convert your strings back into utf-8 from what I assume is ISO-8869-1 (Latin-1).

The string you have has the following []rune{194, 188}.

Encoded as utf-8, has the concrete bytes []byte{195, 130, 194, 188}

To get it to display correctly, you need it so that it has the correct bytes. Essentially, your string is encoding individual bytes as runes, so we need to reverse that.

mistaken := // Your erroneous string
correct := []byte{}
for _, r := range(mistaken) { // Range by runes
    correct = append(correct, byte(r)) // Force conversion to byte (0-255)
}
fmt.Println(string(correct)) // Should print "¼"

As for what might be causing this problem, are you reading in the text from your database with the correct encoding?

Danver Braganza
  • 1,295
  • 10
  • 10