8

I'm using the Google Natural Language API for a project tagging text with sentiment analysis. I want to store my NL results as JSON. If a direct HTTP request is made to Google then a JSON response is returned.

However when using the provided Python libraries an object is returned instead, and that object is not directly JSON serializable.

Here is a sample of my code:

import os
import sys
import oauth2client.client
from google.cloud.gapic.language.v1beta2 import enums, language_service_client
from google.cloud.proto.language.v1beta2 import language_service_pb2

class LanguageReader:
    # class that parses, stores and reports language data from text

    def __init__(self, content=None):

        try:
            # attempts to autheticate credentials from env variable
            oauth2client.client.GoogleCredentials.get_application_default()
        except oauth2client.client.ApplicationDefaultCredentialsError:
            print("=== ERROR: Google credentials could not be authenticated! ===")
            print("Current enviroment variable for this process is: {}".format(os.environ['GOOGLE_APPLICATION_CREDENTIALS']))
            print("Run:")
            print("   $ export GOOGLE_APPLICATION_CREDENTIALS=/YOUR_PATH_HERE/YOUR_JSON_KEY_HERE.json")
            print("to set the authentication credentials manually")
            sys.exit()

        self.language_client = language_service_client.LanguageServiceClient()
        self.document = language_service_pb2.Document()
        self.document.type = enums.Document.Type.PLAIN_TEXT
        self.encoding = enums.EncodingType.UTF32

        self.results = None

        if content is not None:
                self.read_content(content)

    def read_content(self, content):
        self.document.content = content
        self.language_client.analyze_sentiment(self.document, self.encoding)
        self.results = self.language_client.analyze_sentiment(self.document, self.encoding)

Now if you were to run:

sample_text="I love R&B music. Marvin Gaye is the best. 'What's Going On' is one of my favorite songs. It was so sad when Marvin Gaye died."
resp = LanguageReader(sample_text).results
print resp

You would get:

document_sentiment {
  magnitude: 2.40000009537
  score: 0.40000000596
}
language: "en"
sentences {
  text {
    content: "I love R&B music."
  }
  sentiment {
    magnitude: 0.800000011921
    score: 0.800000011921
  }
}
sentences {
  text {
    content: "Marvin Gaye is the best."
    begin_offset: 18
  }
  sentiment {
    magnitude: 0.800000011921
    score: 0.800000011921
  }
}
sentences {
  text {
    content: "\'What\'s Going On\' is one of my favorite songs."
    begin_offset: 43
  }
  sentiment {
    magnitude: 0.40000000596
    score: 0.40000000596
  }
}
sentences {
  text {
    content: "It was so sad when Marvin Gaye died."
    begin_offset: 90
  }
  sentiment {
    magnitude: 0.20000000298
    score: -0.20000000298
  }
}

Which is not JSON. It's an instance of the google.cloud.proto.language.v1beta2.language_service_pb2.AnalyzeSentimentResponse object. And it has no __dict__ attribute attribute so it is not serializable by using json.dumps().

How can I either specify that the response should be in JSON or serialize the object to JSON?

Zach Kagan
  • 89
  • 1
  • 10

1 Answers1

6

Edit: @Zach noted Google's protobuf Data Interchange Format. It seems the preferred option would be to use these protobuf.json_format methods:

from google.protobuf.json_format import MessageToDict, MessageToJson 

self.dict = MessageToDict(self.results)
self.json = MessageToJson(self.results)

From the docstring:

MessageToJson(message, including_default_value_fields=False, preserving_proto_field_name=False)
    Converts protobuf message to JSON format.

    Args:
      message: The protocol buffers message instance to serialize.
      including_default_value_fields: If True, singular primitive fields,
          repeated fields, and map fields will always be serialized.  If
          False, only serialize non-empty fields.  Singular message fields
          and oneof fields are not affected by this option.
      preserving_proto_field_name: If True, use the original proto field
          names as defined in the .proto file. If False, convert the field
          names to lowerCamelCase.

    Returns:
      A string containing the JSON formatted protocol buffer message.
brennan
  • 3,392
  • 24
  • 42
  • Thank you for your reply. How common is it for objects to not have \_\_dict__ attributes like this? If I were to define a class myself and initialize it, it would have one by default. Could it be because the Google natural language API was primarily implemented in another language? – Zach Kagan Aug 15 '17 at 14:22
  • 1
    It looks like a usability oversight but that's not too uncommon if the API is a "beta" (?). Submitting an issue and/or PR would be nice. Which package are you installing? – brennan Aug 15 '17 at 14:40
  • 1
    I think [this is the repo](https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/language/google/cloud/proto/language/v1beta2/language_service_pb2.py) from which the google.cloud.proto.language.v1beta2 is implemented. I think it's parsing the request and making the object there. I will post an issue on that github. – Zach Kagan Aug 15 '17 at 16:59
  • Upon further research, these objects are likely some version or inheritor of the protocol buffer message class. Protocol buffers being how google serializes data and makes it available across systems are languages. However the protobuf message lacks a JSON serializing or dict converting method. – Zach Kagan Aug 15 '17 at 19:17
  • Good digging. Not sure if the protobufs are uniform but in some cases we see [dict conversion](https://github.com/GoogleCloudPlatform/google-cloud-python/blob/c24123c2100fe7a6cff64de7cf6eade97c81fb1f/trace/google/cloud/trace/_gax.py#L24-L25). – brennan Aug 15 '17 at 19:38
  • Editing this answer with these findings. – brennan Aug 15 '17 at 19:51
  • This works, and as of now it's probably the best way to do it. Thank you for your help @bren – Zach Kagan Aug 15 '17 at 20:29
  • 8
    @ZachKagan, I have tried this but it returns `AttributeError: 'dict' object has no attribute 'DESCRIPTOR'` error, can you help me, please! – Abdul Rehman Apr 28 '18 at 03:33