18

I use a Golang HTTP request to get json output as follow. The web service I am trying to access is Micrsoft Translator https://msdn.microsoft.com/en-us/library/dn876735.aspx

//Data struct of TransformTextResponse
type TransformTextResponse struct {
    ErrorCondition   int    `json:"ec"`       // A positive number representing an error condition
    ErrorDescriptive string `json:"em"`       // A descriptive error message
    Sentence         string `json:"sentence"` // transformed text
}


//some code ....
body, err := ioutil.ReadAll(response.Body)
defer response.Body.Close()
if err != nil {
    return "", tracerr.Wrap(err)
}

transTransform = TransformTextResponse{}
err = json.Unmarshal(body, &transTransform)
if err != nil {
   return "", tracerr.Wrap(err)
}

I got an error from invalid character 'ï' looking for beginning of value

So, I try to print the body as string fmt.Println(string(body)), it show:

{"ec":0,"em":"OK","sentence":"This is too strange i just want to go home soon"}

It seems the data doesn't have any problem, so I tried to create the same value by jason.Marshal

transTransform := TransformTextResponse{}
transTransform.ErrorCondition = 0
transTransform.ErrorDescriptive = "OK"
transTransform.Sentence = "This is too strange i just want to go home soon"
jbody, _ := json.Marshal(transTransform)

I found the original data might have problem, so I try to compare two data in []byte format.

Data from response.Body:

[239 187 191 123 34 101 99 34 58 48 44 34 101 109 34 58 34 79 75 34 44 34 115 101 110 116 101 110 99 101 34 58 34 84 104 105 115 32 105 115 32 116 111 111 32 115 116 114 97 110 103 101 32 105 32 106 117 115 116 32 119 97 110 116 32 116 111 32 103 111 32 104 111 109 101 32 115 111 111 110 34 125]

Data from json.Marshal

[123 34 101 99 34 58 48 44 34 101 109 34 58 34 79 75 34 44 34 115 101 110 116 101 110 99 101 34 58 34 84 104 105 115 32 105 115 32 116 111 111 32 115 116 114 97 110 103 101 32 105 32 106 117 115 116 32 119 97 110 116 32 116 111 32 103 111 32 104 111 109 101 32 115 111 111 110 34 125]

Any idea how I parse this response.Body and Unmarshal it into data structure?

Evan Lin
  • 1,272
  • 1
  • 12
  • 25
  • 5
    The first three bytes are the [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) (239 187 191) for UTF-8. Have the server *not* include the UTF-8 BOM or toss out the first 3 bytes before processing the rest (which is the actual UTF-8 encoded string). – user2864740 Jul 14 '15 at 05:16
  • 1
    I got a similar error message: "invalid character... looking for beginning of value" when binding a request body to a model. The error in my case was client-side: The request payload data were sent with the wrong content type (not as json). – koks der drache Apr 19 '21 at 10:15

1 Answers1

38

The server is sending you a UTF-8 text string with a Byte Order Mark (BOM). The BOM identifies that the text is UTF-8 encoded, but it should be removed before decoding.

This can be done with the following line (using package "bytes"):

body = bytes.TrimPrefix(body, []byte("\xef\xbb\xbf")) // Or []byte{239, 187, 191}

PS. The error referring to ï is because the UTF-8 BOM interpreted as an ISO-8859-1 string will produce the characters .

ANisus
  • 74,460
  • 29
  • 162
  • 158
  • 2
    Thank you, but not sure why Microsoft Web site will response with extra BOM identifiers. – Evan Lin Jul 14 '15 at 07:21
  • 5
    @EvanLin Welcome :) . It is strange behavior. The JSON specs (RFC7159) explicitly states that "Implementations MUST NOT add a byte order mark". But it also says that decoding implementations "MAY ignore the presence of a byte order mark rather than treating it as an error", so trimming it away is okay. – ANisus Jul 14 '15 at 07:25
  • 13
    @EvanLin "Microsoft", that's why. – OneOfOne Jul 14 '15 at 14:08