9

Can we expect for two Go objects x, y such that x is equal to y (assuming no trickiness with interfaces and maps, just structs and arrays) that the output of gob_encode(x) and gob_encode(y) will always be the same?

edit (Jun 8 2018):

gob encoding is non-deterministic when maps are involved. This is due to the random iteration order of the maps, resulting in their serialisation to be randomly ordered.

dpington
  • 1,844
  • 3
  • 17
  • 29

3 Answers3

9

You shouldn't really care as long as it "gets the job done". But current encoding/gob implementation is deterministic. But (continue reading)!

Since:

A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

This means if you encode a value of a type for the first time, type information will be sent. If you encode another value of the same type, the type description will not be transmitted again, just a reference to its previous spec. So even if you encode the same value twice, it will produce different byte sequences as the first will contain type spec and the value, the second will contain only a type ref (e.g. type id) and the value.

See this example:

type Int struct{ X int }

b := &bytes.Buffer{}
e := gob.NewEncoder(b)

e.Encode(Int{1})
fmt.Println(b.Bytes())

e.Encode(Int{1})
fmt.Println(b.Bytes())

e.Encode(Int{1})
fmt.Println(b.Bytes())

Output (try it on the Go Playground):

[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0]
[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0 5 255 130 1 2 0]
[23 255 129 3 1 1 3 73 110 116 1 255 130 0 1 1 1 1 88 1 4 0 0 0 5 255 130 1 2 0 5 255 130 1 2 0 5 255 130 1 2 0]

As seen the first Encode() generates lots of bytes plus the value for our Int value being [5 255 130 1 2 0], the second and third calls add the same [5 255 130 1 2 0] sequence.

But if you create 2 different gob.Encoders and you write the same values in the same order, they will produce exact results.

Note that in the previous statement "same order" is also important. Because type specification is transmitted when first value of such type is sent, sending values of different types in different order will transmit type specs in different order too, and so the references/identifiers of the types may differ, which implies that when a value of such type is encoded, different type reference/id will be used/sent.

Also note that the implementation of the gob package may change from release to release. These changes will be backward compatible (they must explicitly state if for some reason they would make backward incompatible changes), but being backward compatible does not mean the output is the same. So different Go versions may produce different results (but all is decodeable with all compatible versions).

icza
  • 389,944
  • 63
  • 907
  • 827
4

It should probably be noted that the accepted answer is not correct: encoding/gob doesn't order map elements in a deterministic way: https://play.golang.org/p/Hh3_5Kb3Znn

I've forked encoding/gob and added some code to order maps by key before writing them to the stream. This will affect performance, but my particular application doesn't need high performance. Remember custom marshalers can break this, so use with care: https://github.com/dave/stablegob

David Brophy
  • 849
  • 9
  • 19
  • This this a recent change or has it always been like this? – dpington Jun 07 '18 at 12:27
  • It seems like it was always like this? – dpington Jun 07 '18 at 12:37
  • Map iteration order (and I guess map item order in encoding/gob) has been fully non-deterministic since at least 2014 (go 1.3): https://github.com/golang/go/issues/6719 I assume the accepted answer didn't test using maps? – David Brophy Jun 08 '18 at 01:09
  • 1
    I posted an answer in 2015 mentioning the possibility of map iteration order, only to have it pointed out to me that the question specifically asks about structs and arrays and *no maps*, so I deleted my answer. I don't mind that, but it's worth pointing out that that's why the accepted answer *isn't* incorrect. – hobbs Aug 26 '19 at 22:07
1

It also isn't deterministic if you use different types and different encoders.

Example:

package main

import (
    "bytes"
    "crypto/sha1"
    "encoding/gob"
    "encoding/hex"
    "log"
)

func main() {
    encint()
    encint64()
    encstring()

}

func encint() {
    s1 := []int{0, 2, 4, 5, 7}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
}

func encint64() {
    s1 := []int64{0, 2, 4, 5, 7}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
}

func encstring() {
    s1 := []string{"a", "b", "c", "d"}
    buf2 := bytes.Buffer{}
    enc2 := gob.NewEncoder(&buf2)
    enc2.Encode(s1)
    log.Println(buf2.Bytes())

    hash := sha1.New()
    hash.Write(buf2.Bytes())
    ret := hash.Sum(nil)
    log.Println(hex.EncodeToString(ret))
}

Run in Go Playground

Notice if you comment out encint() or encint64() the encstring will produce different bytes and a different hashcode.

This happens despite using different objects/pointers.

hbt
  • 1,011
  • 3
  • 16
  • 28