1

Recently I starting using Go. I am facing one problem while parsing XML.

Here is the issue:

I am successfully able to parse the following XML:

<Root>
<cookie name="e1">hsdhsdhs</cookie>
<cookie name="e2">sssss</cookie>
<cookie name="e3">null</cookie>
<info>
<name>sam</name>
</info>
</Root>

Here are the structs:

type Profile struct {
    RootElement xml.Name    `xml:"Root"`
    CookieList  []Cookie    `xml:"cookie"`
    Info        Information `xml:"info"`
}

type Cookie struct {
    Name  string `xml:"name,attr"`
    Value string `xml:",chardata"`
}

type Information struct {
    Name       string `xml:"name"`
}

And the above struct is working fine.

profile := Profile{}
xml.Unmarshal([]byte(xmlString), &profile)
jsonData, _ := json.Marshal(profile)
fmt.Println(string(jsonData))

But as I keep prolog in XML:

<?xml version="1.0" encoding="EUC-JP"?>
    <Root>
    <cookie name="e1">hsdhsdhs</cookie>
    <cookie name="e2">sssss</cookie>
    <cookie name="e3">null</cookie>
    <info>
    <name>sam</name>
    </info>
    </Root>

then while printing, no data is displaying inside the JSON.

Not sure what is the issue here with Prolog.

nagendra547
  • 5,672
  • 3
  • 29
  • 43

1 Answers1

2

Before parsing non-utf8 xml document you have to difine charset reader, thanks to golang.org/x/net/html/charset all you need to do is just replace this string:

xml.Unmarshal([]byte(xmlString), &profile)

with:

decoder := xml.NewDecoder(bytes.NewBufferString(xmlString))
decoder.CharsetReader = charset.NewReaderLabel
err := decoder.Decode(&profile)
Vadim Ashikhman
  • 9,851
  • 1
  • 35
  • 39
  • Thanks that solved the problem, but one issue I found, if there are some Japanese characters in name(e.g - サム ) under Information, then it's printing in weird fashion. It's printing binary characters I guess. – nagendra547 Mar 29 '19 at 02:46
  • @nagendra547 I just tested these symbols in xml and it outputs fine in json. By name you mean `name` attribute of the `cookie` tag? – Vadim Ashikhman Mar 29 '19 at 11:09
  • name under the info TAG – nagendra547 Mar 31 '19 at 22:44