3

I'm trying to inspect my appengine backup files to work out when a data corruption occured. I used gsutil to locate and download the file:

gsutil ls -l gs://my_backup/ > my_backup.txt
gsutil cp gs://my_backup/LongAlphaString.Mymodel.backup_info file://1.backup_info

I then created a small python program, attempting to read the file and parse it using the appengine libraries.

#!/usr/bin/python

APPENGINE_PATH='/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/'
ADDITIONAL_LIBS = [
'lib/yaml/lib'
]
import sys
sys.path.append(APPENGINE_PATH)
for l in ADDITIONAL_LIBS:
  sys.path.append(APPENGINE_PATH+l)

import logging
from google.appengine.api.files import records
import cStringIO

def parse_backup_info_file(content):
  """Returns entities iterator from a backup_info file content."""
  reader = records.RecordsReader(cStringIO.StringIO(content))
  version = reader.read()
  if version != '1':
    raise IOError('Unsupported version')
  return (datastore.Entity.FromPb(record) for record in reader)


INPUT_FILE_NAME='1.backup_info'

f=open(INPUT_FILE_NAME, 'rb')
f.seek(0)
content=f.read()
records = parse_backup_info_file(content)
for r in records:
  logging.info(r)

f.close()

The code for parse_backup_info_file was copied from backup_handler.py

When I run the program, I get the following output:

./view_record.py 
Traceback (most recent call last):
  File "./view_record.py", line 30, in <module>
    records = parse_backup_info_file(content)
  File "./view_record.py", line 19, in parse_backup_info_file
    version = reader.read()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 335, in read
    (chunk, record_type) = self.__try_read_record()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 307, in __try_read_record
    (length, len(data)))
EOFError: Not enough data read. Expected: 24898 but got 2112

I've tried with a half a dozen different backup_info files, and they all show the same error (with different numbers.) I have noticed that they all have the same expected length: I was reviewing different versions of the same model when I made that observation, it's not true when I view the backup files of other Modules.

EOFError: Not enough data read. Expected: 24932 but got 911
EOFError: Not enough data read. Expected: 25409 but got 2220

Is there anything obviously wrong with my approach?

I guess the other option is that the appengine backup utility is not creating valid backup files. Anything else you can suggest would be very welcome. Thanks in Advance

Hamish Currie
  • 491
  • 5
  • 12

1 Answers1

2

There are multiple metadata files created when an AppEngine Datastore backup is run:

LongAlphaString.backup_info is created once. This contains metadata about all of the entity types and backup files that were created in datastore backup.

LongAlphaString.[EntityType].backup_info is created once per entity type. This contains metadata about the the specific backup files created for [EntityType] along with schema information for the [EntityType].

Your code works for interrogating the file contents of LongAlphaString.backup_info, however it seems that you are trying to interrogate the file contents of LongAlphaString.[EntityType].backup_info. Here's a script that will print the contents in a human-readable format for each file type:

import cStringIO
import os
import sys

sys.path.append('/usr/local/google_appengine')
from google.appengine.api import datastore
from google.appengine.api.files import records
from google.appengine.ext.datastore_admin import backup_pb2

ALL_BACKUP_INFO = 'long_string.backup_info'
ENTITY_KINDS = ['long_string.entity_kind.backup_info']


def parse_backup_info_file(content):
    """Returns entities iterator from a backup_info file content."""
    reader = records.RecordsReader(cStringIO.StringIO(content))
    version = reader.read()
    if version != '1':
        raise IOError('Unsupported version')
    return (datastore.Entity.FromPb(record) for record in reader)


print "*****" + ALL_BACKUP_INFO + "*****"
with open(ALL_BACKUP_INFO, 'r') as myfile:
    parsed = parse_backup_info_file(myfile.read())
    for record in parsed:
        print record

for entity_kind in ENTITY_KINDS:
    print os.linesep + "*****" + entity_kind + "*****"
    with open(entity_kind, 'r') as myfile:
        backup = backup_pb2.Backup()
        backup.ParseFromString(myfile.read())
        print backup
Caleb
  • 2,197
  • 3
  • 19
  • 31
  • According to Google, the google.appengine.api.files package is now deprecated, and I don't see any replacement for those functions inside the suggested Google Cloud Storage library. Is our data locked in Google Cloud Datastore forever now? https://cloud.google.com/appengine/docs/standard/python/refdocs/google.appengine.api.files – Albert S Nov 16 '17 at 18:16