6

I have such old.JSON file:

[{
    "id": "333333",
    "creation_timestamp": 0,
    "type": "MEDICAL",
    "owner": "MED.com",
    "datafiles": ["stomach.data", "heart.data"]
}]

Then I create an object based on .proto file:

message Dataset {
  string id = 1;
  uint64 creation_timestamp = 2;
  string type = 3;
  string owner = 4;
  repeated string datafiles = 6;
}

Now I want to save this object save back this object to other .JSON file. I did this:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    json.dump(MessageToJson(item), jsfile)

As a result I have:

"{\n  \"id\": \"333333\",\n  \"type\": \"MEDICAL\",\n  \"owner\": \"MED.com\",\n  \"datafiles\": [\n    \"stomach.data\",\n    \"heart.data\"\n  ]\n}"

How to make this file looks like old.JSON file?

Kenenbek Arzymatov
  • 8,439
  • 19
  • 58
  • 109
  • In what way was this not like the original? I notice that its not in a list. Is that the problem? – tdelaney May 07 '17 at 17:56
  • @tdelaney Yes, it a not a list. It has \" instead of just ", and \n is explicit. – Kenenbek Arzymatov May 07 '17 at 17:59
  • Have you tried `jsfile.write(MessageToJson(item))` directly? – Psidom May 07 '17 at 18:03
  • The list is likely how you save the data in the first place. You defined a message type for a single `dict` inside the list. From what you've posted here I don't know if you have defined another message type for the enclosing list. But if you just encoded each item of that outer list, you lost the list. As for `\n`, try printing the string... they get rendered as newlines. The python representation of a string shows them as \n so you can see them. – tdelaney May 07 '17 at 18:04
  • @Psidom it works, but save as not list, but I can add `[]` to file manually. – Kenenbek Arzymatov May 07 '17 at 18:06
  • It looks like you are using two different functions, both of which convert python objects to a string. One does its job. The other creates a json dump of a string object (careful to properly quote special characters). You would have better luck picking one or the other. If I am right, then this json library that you're using gives you a string you can just write to the file. You probably should have checked the intermediary value in the debugger. :) – Kenny Ostrom May 08 '17 at 00:29

1 Answers1

7

The weird escaping comes from converting the text to json twice, thus forcing the second call to escape the json characters from the first call. Detailed explanation follows:

https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-pysrc

31  """Contains routines for printing protocol messages in JSON format. 
32   
33  Simple usage example: 
34   
35    # Create a proto object and serialize it to a json format string. 
36    message = my_proto_pb2.MyMessage(foo='bar') 
37    json_string = json_format.MessageToJson(message) 
38   
39    # Parse a json format string to proto object. 
40    message = json_format.Parse(json_string, my_proto_pb2.MyMessage()) 
41  """ 

also

 89 -def MessageToJson(message, including_default_value_fields=False): 
...
 99    Returns: 
100      A string containing the JSON formatted protocol buffer message. 

It is pretty clear that this function will return exactly one object of type string. This string contains a lot of json structure, but it's still just a string, as far as python is concerned.

You then pass it to a function which takes a python object (not json), and serializes it to json.

https://docs.python.org/3/library/json.html

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.

Okay, how exactly would you encode a string into json? Clearly it can't just use json specific characters, so those would have to be escaped. Maybe there's an online tool, like http://bernhardhaeussner.de/odd/json-escape/ or http://www.freeformatter.com/json-escape.html

You can go there, post the starting json from the top of your question, tell it to generate the proper json, and you get back ... almost exactly what you are getting at the bottom of your question. Cool everything worked correctly!

(I say almost because one of those links adds some newlines on its own, for no apparent reason. If you encode it with the first link, then decode it with the second, it is exact.)

But that's not the answer you wanted, because you didn't want to double-jsonify the data structure. You just wanted to serialize it to json once, and write that to a file:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    actual_json_text = MessageToJson(item)
    jsfile.write( actual_json_text )

Addendum: MessageToJson might need additional parameters to behave as expected
including_default_value_fields=True
preserving_proto_field_name=True
(see comments and links below)

Kenny Ostrom
  • 5,639
  • 2
  • 21
  • 30
  • Yes, MessageToJson looks good, but causes new problem http://stackoverflow.com/questions/43835243/google-protobuf-json-format-messagetojson-changes-names-of-fields-how-to-avoid – Kenenbek Arzymatov May 08 '17 at 14:43
  • 2
    The key part of the solution is just to change json.dump to jsfile.write. As the answer points out we don't want to double jsonify the message – Macrophage Dec 26 '19 at 04:50
  • It's 2021, an new problem arises: `MessageToJson` sometimes truncates message fields. See https://stackoverflow.com/q/69364763/987846 – kakyo Sep 28 '21 at 16:01