1

Need to wipe down a Datastore namespace before test data is uploaded during testing. Using Cloud Datastore API with Python3.

I'm using Datastore with App Engine in Python3. For testing purposes, I have written a script using the Cloud Datastore API to upload several entities of different kinds to datastore. As this is a small project, at the moment there are only 4 kinds and only 2-3 entities per kind.

I want to add to my pipeline a script to wipe down a particular namespace in Datastore that will contain my test data. I want this to run before the upload of the data and testing so the tests can start from a clean slate every time. I'm using cloud builder to upload the entities to datastore and run my tests in a docker container before deploying to app engine.

At the moment the only solutions I can find are to use Dataflow (totally overkill for this I believe), or to remove each entity individually using it's key. I'd prefer to just wipe down the entire namespace if possible.

If anyone has any advice or suggestions on how to do this please let me know!

TrebledJ
  • 8,713
  • 7
  • 26
  • 48
JTaylor
  • 85
  • 1
  • 8

3 Answers3

3

For App Engine in Python 3 you should use the Google Cloud Client Library for Python

You will also have to set up datastore authentication for your pipeline. See https://cloud.google.com/datastore/docs/reference/libraries on how to add GOOGLE_APPLICATION_CREDENTIALS environment variable.

Steps to delete all the test data

Import the datastore library

from google.cloud import datastore

Create a datastore client

ds_client = datastore.Client()

Query all the entities from your test data

query = ds_client.query(kind="Testdata")

Create a list from the query fetch

entities = list(query.fetch())

Now you can loop through the list and delete all the data:

for entity in entities:
    ds_client.delete(entity.key)

Here is a full example that creates some test data and then deletes all the entities from the test data namespace:

from google.cloud import datastore


def create_some_test_data():
    """Function for Creating Test data"""

    kind = 'Testdata'
    number_of_entities = 10

    ds_client = datastore.Client()

    print('-- Creating test data --')

    for i in range(number_of_entities):
        key = ds_client.key(kind)
        entity = datastore.Entity(key=key)
        entity.update({'test_data': i})
        ds_client.put(entity)
        print('Creating entity: {}'.format(entity))


def delete_all_test_data():
    """Function for Deleting all the Test data"""

    kind = 'Testdata'

    ds_client = datastore.Client()
    fetch_limit = 100

    print('-- Deleting all entities --')

    entities = True
    while entities:
        query = ds_client.query(kind=kind)
        entities = list(query.fetch(limit=fetch_limit))
        for entity in entities:
            print('Deleting: {}'.format(entity))
            ds_client.delete(entity.key)


# Execute the functions
create_some_test_data()
delete_all_test_data()

Running the example should output something like this:

-- Creating test data --
Creating entity: <Entity('Testdata', 5664747265458176) {'test_data': 0}>
Creating entity: <Entity('Testdata', 5723707007827968) {'test_data': 1}>
Creating entity: <Entity('Testdata', 5748214695198720) {'test_data': 2}>
Creating entity: <Entity('Testdata', 5683780991844352) {'test_data': 3}>
Creating entity: <Entity('Testdata', 5742950029983744) {'test_data': 4}>
Creating entity: <Entity('Testdata', 5716561121771520) {'test_data': 5}>
Creating entity: <Entity('Testdata', 5148025362055168) {'test_data': 6}>
Creating entity: <Entity('Testdata', 5729450050191360) {'test_data': 7}>
Creating entity: <Entity('Testdata', 5079111831650304) {'test_data': 8}>
Creating entity: <Entity('Testdata', 5681150794137600) {'test_data': 9}>
-- Deleting all entittes --
Deleting: <Entity('Testdata', 5079111831650304) {'test_data': 8}>
Deleting: <Entity('Testdata', 5148025362055168) {'test_data': 6}>
Deleting: <Entity('Testdata', 5664747265458176) {'test_data': 0}>
Deleting: <Entity('Testdata', 5681150794137600) {'test_data': 9}>
Deleting: <Entity('Testdata', 5683780991844352) {'test_data': 3}>
Deleting: <Entity('Testdata', 5716561121771520) {'test_data': 5}>
Deleting: <Entity('Testdata', 5723707007827968) {'test_data': 1}>
Deleting: <Entity('Testdata', 5729450050191360) {'test_data': 7}>
Deleting: <Entity('Testdata', 5742950029983744) {'test_data': 4}>
Deleting: <Entity('Testdata', 5748214695198720) {'test_data': 2}>
J. Antunes
  • 249
  • 3
  • 6
2

You can write a script in python to delete all the kinds in a particular namespace. Assuming that you know the name of kinds beforehand.

from google.appengine.ext import ndb
from google.appengine.api import namespace_manager

namespace = "PROVIDE_NAMESPACE_HERE"
kind_list = [kind_1,kind_2,kind_3,kind_4]

namespace_manager.set_namespace(namespace) # will set to the namespace provided

for a_kind in kind_list:
    # will fetch keys of all objects in that kind
    kind_keys = a_kind.gql("").fetch(keys_only = True) 
    # will delete all the keys at once
    ndb.delete_multi(kind_keys) 

After deleting all the kinds from a particular namespace your namespace will be visible for around 24 hours in Cloud Datastore and if after that it doesn't contain any kind it will be automatically deleted.

Hope this answers your question!!

skaul05
  • 2,154
  • 3
  • 16
  • 26
  • 1
    Unfortunately I don't think I can use the `google.appengine` libraries with python 3 - I definitely can't use `ndb`. If I can put in alternatives using the datastore client though, this could work! Thanks! – JTaylor Feb 20 '19 at 11:42
2

You should be able to use the same Cloud Datastore API as your app, which can be used even from outside the Google cloud (see How do I use Google datastore for my web app which is NOT hosted in google app engine?). This answer is based on docs only, I didn't actually try it.

You'd need to do something along these lines:

  1. find all kinds in your app's datastore (maybe just in the namespace of interest?). This is possible using the datastore metadata, in particular Kind queries:

    query = client.query(kind='__kind__')
    query.keys_only()
    
    kinds = [entity.key.id_or_name for entity in query.fetch()]
    
  2. find the keys for the entities of each kind, use regular keys-only queries as you now know the kind. Limit these queries to the namespace of interest. Delete the entities by keys, in batch mode to be more efficient:

    for kind in kinds:
        query = client.query(kind=kind, namespace=NAMESPACE)
        query.keys_only()
        keys = query.fetch()
        client.delete_multi(keys)
    
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97