2

I have about 900K entities of a model in python GAE that I would like to export to a CSV file for offline testing. I can use the appcfg.py download_data option, but in this case I don't want to backup to local machine. I'd like a faster way to create the file in GAE, save it to Google Storage or elsewhere, and download it later from multiple machines.

I'm assuming that I will need to do this in a task since it will likely take more than 30 seconds for the operation to complete.

class MyModel(db.model):
  foo = db.StringProperty(required=True)
  bar = db.StringProperty(required=True)

def backup_mymodel_to_file():
  #What to do here?
Chris
  • 4,237
  • 6
  • 30
  • 42

2 Answers2

2

Your best option will be to use map reduce library to export the relevant data to the blobstore, then upload the completed file to Google Storage.

Note that integration between Google Storage and App Engine is a work in progress.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
  • Where can I find an example of exporting to entity data to blobstore using mapreduce? I can find examples of export to blobstore, but not using mapreduce. – Chris Sep 09 '11 at 03:00
  • Use the [BlobstoreOutputWriter](http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/src/mapreduce/output_writers.py#516) (how to use a writer should become apparrent after learning the basic mapreduce framework and workflow). – Nick Johnson Sep 09 '11 at 03:17
  • Is there an option to maintain order or to sort the final results? If not, does it mean that the only way to maintain order is to perform the blobstore write as part of a task? – Chris Sep 09 '11 at 04:17
  • @Chris Results are key/value pairs, and are sorted by the key. – Nick Johnson Sep 09 '11 at 10:10
  • Thanks Nick. I'm expecting that the general case will be that entities to be exported will be ordered by some property, a date for example. (MyModel.all().order('date').fetch(5000)). So the map reduce operation would need to order results by date as it was writing out to the blobstore. Still feasible? Any blobstore write + mapreduce examples around? – Chris Sep 09 '11 at 10:44
  • @Chris Yes, still feasible. I'm not aware of any examples, but you could ask the mapreduce authors or just familiarize yourself with mapreduce, at which point it should be fairly obvious how to use a different OutputWriter. – Nick Johnson Sep 09 '11 at 11:35
0

I know this is old, but I posted an example of using the App Engine Mapper API dumping datastore data into Cloud Storage here: Google App Engine: Using Big Query on datastore?

Community
  • 1
  • 1
Michael Manoochehri
  • 7,931
  • 6
  • 33
  • 47