2

I keep getting random errors like:

suspended generator _get_tasklet(context.py:329) raised ProtocolBufferDecodeError(corrupted)

or

suspended generator put(context.py:796) raised ValueError(Expecting , delimiter: line 1 column 440 (char 440))

or

suspended generator put(context.py:796) raised ValueError(Invalid \escape: line 1 column 18002 (char 18002))

or

suspended generator _get_tasklet(context.py:329) raised ProtocolBufferDecodeError(truncated)

Everything was working fine up until a couple of days ago, and I haven’t made any changes. When I restart my app, everything is fine for about five minutes until I get a

suspended generator _get_tasklet(context.py:329) raised ProtocolBufferDecodeError(corrupted)

After that point, I get one of the other errors on every database put or get. The table and code that causes the error is different every time. I have not idea where to begin, since the error is in a new place every time. These are just regular database puts and gets, like

ndbstate = NdbStateJ.get_by_id(self.screen_name)

or

ndbstate.put()

Google searches haven’t been able to point me in any particular directions. Any ideas? The error

Expecting , delimiter: line 1 column 440 (char 440)

might be because some of the field types in some of the tables are JSON. But why all the sudden?

So maybe I'm not escaping properly somewhere, like by using r'{...}', but if there is a bad entry in there somewhere, how do I fix it if I can't query? And why does it break the whole table for all queries? And why is it random. It's not the same query every time.

Here’s an example of a table

class NdbStateJ(ndb.Model):
    last_id = ndb.IntegerProperty()
    last_search_id = ndb.IntegerProperty()
    last_geo_id = ndb.IntegerProperty()
    mytweet_num = ndb.IntegerProperty()
    mentions_processed = ndb.JsonProperty()
    previous_follower_responses = ndb.JsonProperty()
    my_tweets_tweeted = ndb.JsonProperty()
    responses_already_used = ndb.JsonProperty()
    num_followed_by_cyborg = ndb.IntegerProperty(default=0)
    num_did_not_follow_back = ndb.IntegerProperty(default=0)
    language_model_vector = ndb.FloatProperty(repeated=True)
    follow_wait_counter = ndb.IntegerProperty(default=0)

Here’s an example of creating a table

ndbstate = NdbStateJ(id=screen_name,
last_id = 37397357946732541,
last_geo_id = 37397357946732541,
last_search_id = 0,
mytweet_num = 0,
mentions_processed = [],
previous_follower_responses = [],
my_tweets_tweeted = [],
responses_already_used= [],
language_model_vector = [])
ndbstate.put()
Jonathan Mugan
  • 632
  • 2
  • 9
  • 12
  • If you try to upload to a different app and run from there, do you get the same error? If you never get it, then it's possible it's corruption on your other app. Also, what do you mean by "restart my app" – Patrice May 28 '15 at 18:15
  • Cool. Thanks. I finally figured it out. It was a beast. See below. – Jonathan Mugan May 29 '15 at 16:51

1 Answers1

1

It was malformed JSON in the database causing the problem. I don't know why suddenly the problem started happening everywhere; maybe something changed on the Google side, or maybe I wasn't checking sufficiently, and new users were able to enter in malformed data. Who knows.

To fix it, I took inspiration from https://stackoverflow.com/users/1011633/nizz responding to App Engine return JSON from JsonProperty, https://stackoverflow.com/users/1709587/mark-amery responding to How to escape special characters in building a JSON string?, and https://stackoverflow.com/users/1639625/tobias-k responding to How do I automatically fix an invalid JSON string?.

I replaced ndb.JsonProperty() with ExtendedJsonProperty where the extended version looks similar to the code below.

import json
from google.appengine.ext import ndb 
import logging
logging.getLogger().setLevel(logging.DEBUG)
import re

class ExtendedJsonProperty(ndb.BlobProperty):
    # Inspired by https://stackoverflow.com/questions/18576556/app-engine-return-json-from-jsonproperty
    def _to_base_type(self, value):
        logging.debug('Dumping value '+str(value))
        try:
            return json.dumps(value) 
        except Exception as e:
            logging.warning(('trying to fix error dumping from database: ') +str(e))
            return fix_json(value,json.dumps)

    def _from_base_type(self, value):
        # originally return json.loads(value)
        logging.debug('Loading value '+str(value))
        try:
            return json.loads(value)
        except Exception as e:
            logging.warning(('trying to fix error loading from database: ') +str(e))
            return fix_json(value,json.loads)        

def fix_json(s,json_fun):
    for _i in range(len(s)):
        try:
            result = json_fun(s)   # try to parse...
            return result                    
        except Exception as e:  
            logging.debug('Exception for json loads: '+str(e))          
            if 'delimiter' in str(e):
                # E.g.: "Expecting , delimiter: line 34 column 54 (char 1158)"
                logging.debug('Escaping quote to fix.')
                s = escape_quote(s,e)
            elif 'escape' in str(e):
                # E.g.: "Invalid \escape: line 1 column 9 (char 9)"
                logging.debug('Removing invalid escape to fix.')
                s = remove_invalid_escape(s)
            else:
                break
    return json_fun('{}')

def remove_invalid_escape(value):
    # Inspired by https://stackoverflow.com/questions/19176024/how-to-escape-special-characters-in-building-a-json-string
    return re.sub(r'\\(?!["\\/bfnrt])', '', value)

def escape_quote(s,e):
    # Inspired by https://stackoverflow.com/questions/18514910/how-do-i-automatically-fix-an-invalid-json-string
    # "Expecting , delimiter: line 34 column 54 (char 1158)"
    # position of unexpected character after '"'
    unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
    # position of unescaped '"' before that
    unesc = s.rfind(r'"', 0, unexp)
    s = s[:unesc] + r'\"' + s[unesc+1:]
    # position of corresponding closing '"' (+2 for inserted '\')
    closg = s.find(r'"', unesc + 2)
    if closg + 2 < len(s):
        print closg, len(s)
        s = s[:closg] + r'\"' + s[closg+1:]
    return s
Community
  • 1
  • 1
Jonathan Mugan
  • 632
  • 2
  • 9
  • 12
  • WOW, that is indeed a doozy. Can you pinpoint on your end what entry caused the issue or which line caused the "faulty" JSON? because if nothing changed on your end regarding this, this might be worth raising the issue on our Issue tracker (https://code.google.com/p/googleappengine/issues/list) – Patrice May 29 '15 at 17:06
  • No, I never could find the source of the faulty JSON. It seemed to break everywhere at once starting May 13 - 15. If a code update was made to App Engine at that time related to the database or JSON, that might be worth looking at. If not, maybe one of my users entered in some data that I didn't sufficiently check, and it just *seemed* like it broke everywhere. – Jonathan Mugan May 29 '15 at 19:46
  • I have some vested interest in this, being a member of the Cloud Platform Support team. I'd like to make sure that if this happens to someone else, the system will catch it. Did you manage to get what input actually "corrupted" your datastore? – Patrice May 29 '15 at 19:57
  • Unfortunately, I have no idea how it all came about. I just noticed that my app was not responding one day, and the log showed all of those error messages. The messages seemed to come from different parts of the database for data corresponding to different users. – Jonathan Mugan May 29 '15 at 20:14
  • hmmm.... okay. I'll try to dig into that and see what I can do with it then. Thanks! – Patrice May 29 '15 at 20:14