0

I am using AppEngine with the Python runtime environment to host a dashboard for my team. The data for the dashboard is stored in Memcache and/or Cloud Datastore. New data is pulled into the application using the BigQuery API.

class ExampleForStackOverflow(webapp2.RequestHandler):

    def get(self):

        credentials = GoogleCredentials.get_application_default()
        bigquery_service = build('bigquery', 'v2', credentials=credentials)

        query = """SELECT field1, field2
                    FROM
                    [table_name];"""

        try:
            timeout = 10000
            num_retries = 5
            query_request = bigquery_service.jobs()
            query_data = {
                'query': (query),
                'timeoutMs': timeout,
            }

            query_response = query_request.query(
                projectId='project_name',
                body=query_data).execute(num_retries=num_retries)

            # Insert query response into datastore
            for row in query_response['rows']:
                parent_key = ndb.Key(MyModel, 'default')
                item = MyModel(parent=parent_key)
                item.field1 = row['f'][0]['v']
                item.field2 = row['f'][1]['v']

                item.put()

        except HttpError as err:
            print('Error: {}'.format(err.content))
            raise err

These queries will return an indeterminate number of records. I want the dashboard to display the results of the queries regardless of the number of records so using order() by created and then using fetch() to pull a certain number of records won't help.

Is it possible to write a query to return everything from the last put() operation?

So far I have tried to return all records that have been written within a certain time window (e.g. How to query all entries from past 6 hours ( datetime) in GQL?)

That isn't working for me in a reliable way because every so often the cron job that queries for the new data will fail so I'm left with a blank graph until the cron job runs the following day.

I need a resilient query that will always return data. Thanks in advance.

Community
  • 1
  • 1
afed
  • 519
  • 2
  • 6

2 Answers2

2

You could have an additional DateTimeProperty type property in MyModel, let's call it last_put, which will have the auto_now option set to True. So the datetime of the most recent update of such entity would be captured in its last_put property.

In your get() method you'd start with an ancestor query on the MyModel entities, sorted by last_put and fetching only one item - it will be the most recently updated one.

The last_put property value of that MyModel entity will give the datetime of the last put() you're seeking. Which you can then use in your bigquery query, as mentioned in the post you referenced, to get the entities since that datetime.

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
0

Dan's answer led me down the right path but I used a variation of what he suggested (mostly because I don't have a good understanding of ancestor queries). I know this isn't the most efficient way to do this but it'll work for now. Thanks, Dan!

My model:

class MyModel(ndb.Model):
    field1 = ndb.StringProperty(indexed=True)
    field2 = ndb.StringProperty(indexed=True)
    created = ndb.DateTimeProperty(default=datetime.datetime.now())

My query:

query = MyModel.query().order(-MyModel.created)
query = query.fetch(1, projection=[MyModel.created])
for a in query:
    time_created = a.created
query = MyModel.query()
query = query.filter(MyModel.created == time_created)
afed
  • 519
  • 2
  • 6