Hashing reflect.Type

Question

I'm trying to find a quick way of performing comparisons between two []reflect.Type. Right now I have the following:

func Equal(left, right []reflect.Type) bool {
    if len(left) != len(right) {
        return false
    }
    for i := 0; i < len(left); i++ {
        if left[i] != right[i] {
            return false
        }
    }
    return true
}

Most of the slices don't change. So if I can find a way to hash them, I'd get a huge perf boost.

Background

I'm trying to (for fun) implement a form of function overloading in Go using the reflect package. The first thing I did is to convert each specialised/overloaded function into a signature type.

type Signature struct {
    Variadic bool
    In, Out  []reflect.Type
}

The idea is that, when the overloaded function gets called, I'll convert the arguments into a slice of reflect.Type and then find a Signature where the In types match.

This works, but for each comparison, it's a linear scan which is pretty slow. If I could hash the slice of []reflect.Type I could stick that in a map and get constant time lookups.

Hashing a series of values requires a linear scan over that series of values. I'm not sure how you envision that being faster than the short-circuiting comparison you have already. — JimB, Apr 03 '18 at 15:59
@JimB without the hashing, it's a linear scan for every overload. `O(num_args * num_overloads)` vs `O(num_args)`. As I said in the question, most of the slices don't change. So the hash values will be pre-computed/cached. — Ilia Choly, Apr 03 '18 at 16:01
Yes, but you need to calculate the hash in order to compare it, which means a linear scan. The lookup speed of a map isn't because of the hash comparison, it's because of the O(1) lookup time within the map. Constant time lookups does not mean constant time hashing. — JimB, Apr 03 '18 at 16:10
Maybe it's just the obvious name you need, a type can be uniquely identified by `t.PkgPath()` + `t.Name()`. — JimB, Apr 03 '18 at 16:12
`reflect.Type` is an interface, the dymanic value stored in it in most cases is a pointer, which means comparing 2 `reflect.Type` values is just comparing some pointers. If you need better performance, I would make sure the `Signature` values themselves "interned" or have a unique identifier which can be used for simple and most efficient comparison / lookup. — icza, Apr 03 '18 at 16:15
@JimB I'm reusing the hashed values, I don't know how else to explain it ... — Ilia Choly, Apr 03 '18 at 16:23
@IliaCholy, that's fine, it's just not how you've described the problem. Just make each Signature return a unique, comparable value. — JimB, Apr 03 '18 at 16:32
@JimB I asked for a way to hash a `reflect.Type`. Pretty cut and dry. — Ilia Choly, Apr 03 '18 at 18:14

Ilia Choly · Answer 1 · 2018-04-03T20:25:59.193

I ended up abusing the built-in map to assign unique ids to each reflect.Type. Then I hash those using djb2.

type TypeCode struct {
    seq   int64
    codes map[reflect.Type]int64
}

func (td *TypeCode) TypeID(t reflect.Type) int64 {
    if code, ok := td.codes[t]; ok {
        return code
    }
    td.seq++
    td.codes[t] = td.seq
    return td.seq
}

func (td *TypeCode) SliceTypeID(tt []reflect.Type) int64 {
    id := int64(5381)
    for _, t := range tt {
        id = ((id << 5) + id) + td.TypeID(t)
    }
    return id
}

edit: I switched to a string based approach which is less efficient, but removes any potential for collisions.

type TypeCode struct {
    seq   int64
    codes map[reflect.Type]string
}

func (td *TypeCode) TypeID(t reflect.Type) string {
    if code, ok := td.codes[t]; ok {
        return code
    }
    td.seq++
    id := strconv.FormatInt(td.seq, 10)
    td.codes[t] = id
    return id
}

func (td *TypeCode) SliceTypeID(tt []reflect.Type) string {
    ids := make([]string, len(tt))
    for i := 0; i < len(tt); i++ {
        ids[i] = td.TypeID(tt[i])
    }
    return strings.Join(ids, ".")
}

Hashing reflect.Type

1 Answers1

Linked