14

I'm trying to write a object to a gzipped json file in one step (minimising code, and possibly saving memory space). My initial idea (python3) was thus:

import gzip, json
with gzip.open("/tmp/test.gz", mode="wb") as f:
  json.dump({"a": 1}, f)

This however fails: TypeError: 'str' does not support the buffer interface, which I think has to do with a string not being encoded to bytes. So what is the proper way to do this?

Current solution that I'm unhappy with:

Opening the file in text-mode solves the problem:

import gzip, json
with gzip.open("/tmp/test.gz", mode="wt") as f:
  json.dump({"a": 1}, f)

however I don't like text-mode. In my mind (and maybe this is wrong, but supported by this), text mode is used to fix line-endings. This shouldn't be an issue because json doesn't have line endings, but I don't like it (possibly) messing with my bytes, it (possibly) is slower because it's looking for line-endings to fix, and (worst of all) I don't understand why something about line-endings fixes my encoding problems?

Community
  • 1
  • 1
Claude
  • 8,806
  • 4
  • 41
  • 56
  • Actually I just realised that the gzip-part has nothing to do with this all, I should rewrite the question – Claude May 18 '15 at 08:02

1 Answers1

4

offtopic: I should have dived into the docs a bit further than I initially did.

The python docs show:

Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding (the default being UTF-8). 'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

I don't fully agree that the result from a json encode is a string (I think it should be a set of bytes, since it explicitly defines that it uses utf-8 encoding), but I noticed this before. So I guess text mode it is.

Claude
  • 8,806
  • 4
  • 41
  • 56
  • This https://stackoverflow.com/a/49535758/1335793 quotes the docs "The json module always produces `str` objects, not `bytes` objects." I think this answer explains it well https://stackoverflow.com/questions/39450065/python-3-read-write-compressed-json-objects-from-to-gzip-file?noredirect=1&lq=1 json.dump is really doing 2 things, serializing your dict object to a string, then writing it to a file with an encoding (similar to file.write() ) – Davos Nov 28 '18 at 04:35
  • JSON module is either serializing from an object to a string/file (dumps/dump) or deserializing to an object from a string/file (loads/load), and JSON is a textual representation of data, intended to be stored or transmitted as text, so it does make sense. – Davos Nov 28 '18 at 04:37