0

I have a protobuf definition to handle paged results from an API:

message ArrayRespone {
    int32 count = 1;
    string next_url = 2;
    string request_id = 3;
    repeated google.protobuf.Any results = 4;
    string status = 5;
}

The goal here is to deserialize the paged responses from this API and then extract the results from each page into slices of the appropriate type. I wrote code in Go that does this:

func getData[T ~proto.Message](data []byte) ([]T, error) {

    var resp *ArrayRespone
    if err := json.Unmarshal(data, &resp); err != nil {
        return nil, err
    }
    
    var items []T
    for _, result := range resp.Results {
        var item T
        if err := result.UnmarshalTo(item); err != nil {
            return nil, err
        }

        items = append(items, item)
    }

    return items, nil
}

The problem I'm running into is that, when testing this code, I run into the following error:

proto: mismatched message type: got "X", want ""

From this, I can understand that Protobuf doesn't have the information necessary to determine which type it's working with. Looking at the definition for Any, I can see that it has a TypeUrl field and a Value field. It appears that the type URL is empty but shouldn't be. So, my thought was that if I were to set it to X, the error would go away, but that wouldn't work either because the Value field was still empty; my JSON data had been ignored.

How can I get this code working?

Woody1193
  • 7,252
  • 5
  • 40
  • 90
  • 2
    Protobuf `Any` is not *any random protobuf type*, it is a specific structure with a type field and a payload field. You can unmarshal the payload only into the same type it was marshalled from. The assumption is that the producer of that `Any` and the consumer share the same protobuffer schema – blackgreen Jun 02 '22 at 08:36
  • @blackgreen The proto definition I'm using was designed to match the fields on the JSON payload I'm expecting. The issue is that the payload wasn't generated by protobuf so it's missing the `TypeUrl` field. So, I suppose that I'll have to replace `Any` with some other type and do deserialization in two steps instead of one. – Woody1193 Jun 02 '22 at 08:51
  • 1
    note that deserializing protobuffer without type information is basically black magic, see [here](https://stackoverflow.com/questions/41348512/protobuf-unmarshal-unknown-message) for details – blackgreen Jun 02 '22 at 08:53
  • I think this is a bad use of generics. I would either specialise the proto definition a bit more, even at cost of creatiing multiples. Or replace `Any` with `Struct` and then unmarshalling directly into a struct with JSON. – Lucat Jun 06 '22 at 15:30
  • @Luke I found a solution that I'll post below – Woody1193 Jun 06 '22 at 23:12
  • @Luke This is the exact reason generics were developed. I have many response types embedded within the same overall structure that need to be handled the same way. In the larger context of the project I'm working on, creating several different response types would require a large amount of reused code/reflection at higher levels. Therefore, getting this response right will save a lot of work later. – Woody1193 Jun 07 '22 at 00:06

1 Answers1

1

I found two potential solutions to this problem but they both involve a custom implementation of UnmarshalJSON. First, I tried modifying my proto definition so that results was of type bytes, but the JSON deserialization failed because the source data wasn't a string or anything that could be deserialized to []byte directly. So, I had to roll my own:

Using Struct

Using the google.protobuf.Struct type, I modified my ArrayResponse to look like this:

message ArrayRespone {
    int32 count = 1;
    string next_url = 2;
    string request_id = 3;
    repeated google.protobuf.Struct results = 4;
    string status = 5;
}

and then wrote a custom implementation of UnmarshalJSON that worked like this:

// UnmarshalJSON converts JSON data into a Providers.Polygon.ArrayResponse
func (resp *ArrayRespone) UnmarshalJSON(data []byte) error {

    // First, deserialize the JSON into a mapping between key fields and values
    // If this fails then return an error
    var mapped map[string]interface{}
    if err := json.Unmarshal(data, &mapped); err != nil {
        return fmt.Errorf("failed to perform first-pass unmarshal, error: %v", err)
    }

    // Next, extract the count from the mapping; if this fails return an error
    if err := extractValue(mapped, "count", &resp.Count); err != nil {
        return err
    }

    // Extract the next URL from the mapping; if this fails return an error
    if err := extractValue(mapped, "next_url", &resp.NextUrl); err != nil {
        return err
    }

    // Extract the request ID from the mapping; if this fails return an error
    if err := extractValue(mapped, "request_id", &resp.RequestId); err != nil {
        return err
    }

    // Extract the status from the mapping; if this fails return an error
    if err := extractValue(mapped, "status", &resp.Status); err != nil {
        return err
    }

    // Now, extract the results array into a temporary variable; if this fails return an error
    var results []interface{}
    if err := extractValue(mapped, "results", &results); err != nil {
        return err
    }

    // Finally, iterate over each result and add it to the slice of results by attempting
    // to convert it to a Struct; if any of these fail to convert then return an error
    resp.Results = make([]*structpb.Struct, len(results))
    for i, result := range results {
        if value, err := structpb.NewStruct(result.(map[string]interface{})); err == nil {
            resp.Results[i] = value
        } else {
            return fmt.Errorf("failed to create struct from result %d, error: %v", i, err)
        }
    }

    return nil
}

// Helper function that attempts to extract a value from a standard mapping of interfaces
// and set a field with it if the types are compatible
func extractValue[T any](mapping map[string]interface{}, field string, value *T) error {
    if raw, ok := mapping[field]; ok {
        if inner, ok := raw.(T); ok {
            *value = inner
        } else {
            return fmt.Errorf("failed to set value %v to field %s (%T)", raw, field, *value)
        }
    }

    return nil
}

Then, in my service code, I modified the unmarshalling portion of my code to consume the Struct objects. This code relies on the mapstructure package:

func getData[T ~proto.Message](data []byte) ([]T, error) {

    var resp *ArrayRespone
    if err := json.Unmarshal(data, &resp); err != nil {
        return nil, err
    }
    
    items := make([]T, len(resp.Results))
    for i, result := range resp.Results {
        var item T
        if err := mapstructure.Decode(result.AsMap(), &item); err != nil {
            return nil, err
        }

        items[i] = item
    }

    return items, nil
}

This works so long as all your fields can be easily deserialized to a field on the google.protobuf.Value type. However, this wasn't the case for me as several of the fields in types that I would call getData with have custom implementations of UnmarshalJSON. So, the solution I actually chose was to use bytes instead:

Using Bytes

For this implementation, I didn't need to rely on any imported types so the message itself was much easier to work with:

message ArrayRespone {
    int32 count = 1;
    string next_url = 2;
    string request_id = 3;
    bytes results = 4;
    string status = 5;
}

This still necessitated the development of a custom implementation for UnmarshalJSON, but that implementation was also simpler:

func (resp *ArrayRespone) UnmarshalJSON(data []byte) error {

    // First, deserialize the JSON into a mapping between key fields and values
    // If this fails then return an error
    var mapped map[string]*json.RawMessage
    if err := json.Unmarshal(data, &mapped); err != nil {
        return fmt.Errorf("failed to perform first-pass unmarshal, error: %v", err)
    }

    // Next, extract the count from the mapping; if this fails return an error
    if err := extractValue(mapped, "count", &resp.Count); err != nil {
        return err
    }

    // Extract the next URL from the mapping; if this fails return an error
    if err := extractValue(mapped, "next_url", &resp.NextUrl); err != nil {
        return err
    }

    // Extract the request ID from the mapping; if this fails return an error
    if err := extractValue(mapped, "request_id", &resp.RequestId); err != nil {
        return err
    }

    // Extract the status from the mapping; if this fails return an error
    if err := extractValue(mapped, "status", &resp.Status); err != nil {
        return err
    }

    // Finally, iterate over each result and add it to the slice of results by attempting
    // to convert it to a Struct; if any of these fail to convert then return an error
    if raw, ok := mapped["results"]; ok {
        resp.Results = *raw
    }

    return nil
}

// Helper function that attempts to extract a value from a standard mapping of interfaces
// and set a field with it if the types are compatible
func extractValue[T any](mapping map[string]*json.RawMessage, field string, value *T) error {
    if raw, ok := mapping[field]; ok {
        if err := json.Unmarshal(*raw, &value); err != nil {
            return fmt.Errorf("failed to set value %s to field %s (%T)", *raw, field, *value)
        }
    }

    return nil
}

Then, I modified my getData function to be:

func getData[T ~proto.Message](data []byte) ([]T, error) {

    var resp *ArrayRespone
    if err := json.Unmarshal(data, &resp); err != nil {
        return nil, err
    }
    
    var items []T
    if err := json.Unmarshal(resp.Results, &items); err != nil {
        return nil, err
    }

    return items, nil
}

Clearly, this implementation is simpler and requires one less deserialization step, which means less reflection than the Struct implementation.

Woody1193
  • 7,252
  • 5
  • 40
  • 90