2

I am using firestore in my app and I want to export the whole database as a json. I did an export of my firestore database (with gcloud firestore export) and I downloaded the backup to my computer.

My code for parsing the output-x files is this

import io
import json
import sys

sys.path.append('/Users/riterrani/Downloads/google-cloud-sdk/platform/google_appengine')
from google.appengine.api.files import records
from google.appengine.datastore import entity_pb
from google.appengine.api import datastore

def default(obj):
  """Default JSON serializer."""
  import calendar, datetime

  if isinstance(obj, datetime.datetime):
    if obj.utcoffset() is not None:
      obj = obj - obj.utcoffset()
    millis = int(
      calendar.timegm(obj.timetuple()) * 1000 +
      obj.microsecond / 1000
    )
    return millis
  raise TypeError('Not sure how to serialize %s' % (obj,))


items = []


f = open('data.json', 'w')
for fileIndex in range(0, 8):
  raw = open('output-' + str(fileIndex), 'r')
  reader = records.RecordsReader(raw)
  for record in reader:
    entity_proto = entity_pb.EntityProto(contents=record)
    entity = datastore.Entity.FromPb(entity_proto)
    # print entity
    items.append(entity)
    print "Writing " + str(len(items)) + " items to file"
    f.write(json.dumps(entity, default=default, encoding='latin-1'))
    f.write("\n")


f.close()

The script is working, but all the attributes that are firestore Maps have bad encoding

{"environment_changes": ["j\u0004j\u0000r\u0000z\u0014\u001a\u0004date \u0000*\n\u001a\b20191101z.\u001a\u0007changes \u0001*!\u001a\u001fEnvironmentChangeType.new_setupz\u00c1\u0001\b\u0013\u001a\u000benvironment \u0000*\u00ad\u0001\u001a\u00aa\u0001j\u0004j\u0000r\u0000z\u0014\u001a\fexposureTime \u0000*\u0002\b\u0012z&\u001a\u0004type \u0000*\u001c\u001a\u001aEnvironmentTypeEnum.indoorz\u0010\u001a\u0004name \u0000*\u0006\u001a\u0004TenrzO\b\u0013\u001a\u0006lights \u0001*A\u001a?j\u0004j\u0000r\u0000z \u001a\u0004type \u0000*\u0016\u001a\u0014LightingTypeEnum.hpsz\u0012\u001a\u0007wattage \u0000*\u0005\u001a\u0003600\u0082\u0001\u0000\u0082\u0001\u0000\u0082\u0001\u0000", "j\u0004j\u0000r\u0000z\u0014\u001a\u0004date \u0000*\n\u001a\b20191101z0\u001a\u0007changes \u0001*#\u001a!EnvironmentChangeType.name_changez6\u001a\u0007changes \u0001*)\u001a'EnvironmentChangeType.exposition_changez\u00c1\u0001\b\u0013\u001a\u000benvironment \u0000*\u00ad\u0001\u001a\u00aa\u0001j\u0004j\u0000r\u0000z\u0014\u001a\fexposureTime \u0000*\u0002\b\u0018z&\u001a\u0004type \u0000*\u001c\u001a\u001aEnvironmentTypeEnum.indoorz\u0010\u001a\u0004name \u0000*\u0006\u001a\u0004TentzO\b\u0013\u001a\u0006lights \u0001*A\u001a?j\u0004j\u0000r\u0000z \u001a\u0004type \u0000*\u0016\u001a\u0014LightingTypeEnum.hpsz\u0012\u001a\u0007wattage \u0000*\u0005\u001a\u0003600\u0082\u0001\u0000\u0082\u0001\u0000z\u00ca\u0001\b\u0013\u001a\u0014original_environment \u0000*\u00ad\u0001\u001a\u00aa\u0001j\u0004j\u0000r\u0000z\u0014\u001a\fexposureTime \u0000*\u0002\b\u0012z&\u001a\u0004type \u0000*\u001c\u001a\u001aEnvironmentTypeEnum.indoorz\u0010\u001a\u0004name \u0000*\u0006\u001a\u0004TenrzO\b\u0013\u001a\u0006lights \u0001*A\u001a?j\u0004j\u0000r\u0000z \u001a\u0004type \u0000*\u0016\u001a\u0014LightingTypeEnum.hpsz\u0012\u001a\u0007wattage \u0000*\u0005\u001a\u0003600\u0082\u0001\u0000\u0082\u0001\u0000\u0082\u0001\u0000", "j\u0004j\u0000r\u0000z\u0014\u001a\u0004date \u0000*\n\u001a\b20191117z6\u001a\u0007changes \u0001*)\u001a'EnvironmentChangeType.exposition_changez\u00c1\u0001\b\u0013\u001a\u000benvironment \u0000*\u00ad\u0001\u001a\u00aa\u0001j\u0004j\u0000r\u0000z\u0014\u001a\fexposureTime \u0000*\u0002\b\u0012z&\u001a\u0004type \u0000*\u001c\u001a\u001aEnvironmentTypeEnum.indoorz\u0010\u001a\u0004name \u0000*\u0006\u001a\u0004TentzO\b\u0013\u001a\u0006lights \u0001*A\u001a?j\u0004j\u0000r\u0000z \u001a\u0004type \u0000*\u0016\u001a\u0014LightingTypeEnum.hpsz\u0012\u001a\u0007wattage \u0000*\u0005\u001a\u0003600\u0082\u0001\u0000\u0082\u0001\u0000z\u00ca\u0001\b\u0013\u001a\u0014original_environment \u0000*\u00ad\u0001\u001a\u00aa\u0001j\u0004j\u0000r\u0000z\u0014\u001a\fexposureTime \u0000*\u0002\b\u0018z&\u001a\u0004type \u0000*\u001c\u001a\u001aEnvironmentTypeEnum.indoorz\u0010\u001a\u0004name \u0000*\u0006\u001a\u0004TentzO\b\u0013\u001a\u0006lights \u0001*A\u001a?j\u0004j\u0000r\u0000z \u001a\u0004type \u0000*\u0016\u001a\u0014LightingTypeEnum.hpsz\u0012\u001a\u0007wattage \u0000*\u0005\u001a\u0003600\u0082\u0001\u0000\u0082\u0001\u0000\u0082\u0001\u0000"], "lights": ["j\u0004j\u0000r\u0000z \u001a\u0004type \u0000*\u0016\u001a\u0014LightingTypeEnum.hpsz\u0012\u001a\u0007wattage \u0000*\u0005\u001a\u0003600\u0082\u0001\u0000"],}

How I can decode this to have a JSON?

I am not a python dev, I got the code from here

rterrani
  • 526
  • 2
  • 10
  • Make sure you use the right encoding (default is utf-8 not latin-1, but it depends on how you encoded the json when you built the data file) when writing your output file. – BoboDarph Nov 21 '19 at 12:47
  • If I don't specify encoding I am getting this error ```"UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position json" ``` latin-1 was the only that worked for me – rterrani Nov 21 '19 at 13:36
  • @rterrani Where did you obtain the "google-cloud-sdk" folder specified in your script? I'm trying to read a Firestore backup as well, but can't even find a version of the google-cloud-sdk with the `google.appengine.api.files` scripts... – Venryx Feb 22 '20 at 04:15
  • While I couldn't find a google-cloud-sdk with the scripts in question, this archive seems to contain them (in google_appengine\google\appengine\api\files): https://console.cloud.google.com/storage/browser/appengine-sdks/deprecated/180 – Venryx Feb 22 '20 at 04:23

2 Answers2

3

It's not a encoding problem.

It seems that your nested objects and lists are still in LevelDB format, you can do a recursive function to parse every level of your entities.

Daniel Kim
  • 46
  • 2
1

It took a while, but I eventually fit together all the pieces and got a Python script working which can convert a Firestore gcloud, full-database backup into a standard JSON file.

I've put together the script and its instructions here: https://github.com/Venryx/firestore-leveldb-tools

After installing Python 2.7, and cloning/downloading the repo, just run:

python ToJSON.py PATH_TO_FIRESTORE_BACKUP_FOLDER

(with PATH_TO_FIRESTORE_BACKUP_FOLDER being the direct parent folder of the "output-0", etc. files)

A Data.json file will then be created in the backup folder, with the original database structure. (collections as json objects, their documents as keyed entries underneath)

Venryx
  • 15,624
  • 10
  • 70
  • 96