4

I have a very simple app engine application serving 1.8Kb - 3.6Kb gzipped files stored in blobstore. A map from numeric file ID to blobkey is stored in the datastore and cached in memcache. The servlet implementation is trivial: a numeric file ID is received in the request; the blobkey is retrieved from memcache/datastore and the standard BlobstoreService.serve(blobKey, resp) is invoked to serve the response. As expected the app logs show that response sizes always match the blobstore file size that was served.

I've been doing some focused volume testing and this has revealed that the outgoing bandwidth quota utilization is consistently reported to be roughly 2x what I expect given requests received. I've been doing runs of 100k requests at a time summing bytes received at the client, comparing this with the app logs and everything balances except for the outgoing bandwith quota utilization.

Any help in understanding how the outgoing bandwidth quota utilization is determined for the simple application I describe above? What am I missing or not accounting for? Why would it not tally with the totals shown for response sizes in the app logs?

[Update 2013.03.04: I abandoned the use of the blobstore and reverted back to storing the blobs directly in the datastore. The outgoing bandwidth utilization is now as exactly as expected. It appears that the 2x multiplier is somehow related to the use of the blobstore (but it remains inexplicable). I encountered several other problems with using the blobstore service; most problematic were the additional datastore reads and writes (which are related to the blobinfo and blobindex meta data managed in the datastore - and which is what I was originally trying to reduce by migrating my data to the blobstore). A particularly serious issue for me was this: https://code.google.com/p/googleappengine/issues/detail?id=6849. I consider this a blobstore service memory leak; once you create a blob you can never delete the blob meta data in the datastore. I will be paying for this in perpetuity since I was foolish enough to run a 24 hr volume and performance test and now am unable to free the storage used during the test. It appears that the blobstore is currently only suitable for use in very specific scenarios (i.e. permanently static data). Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.]

Dean
  • 14,688
  • 1
  • 19
  • 13
  • 1
    I am a bit concerned about this as I am also in the test phase - can you not delete the whole application? –  Mar 04 '13 at 10:02
  • Peter, I'm not sure. I agree that deleting the application should result in the removal of all data but I'm concerned that I will lose the application ID. I understand that an ID is reserved (effectively forever) once created but it's unclear from the documentation if that ID is reserved for reuse by my account or if it is simply reserved by the system (thereby preventing me from reusing it). I will test with a dummy application and report back. Thanks for the idea though. – Dean Mar 04 '13 at 11:21
  • Looks like deleting the appliction is the way to go to clear the __BlobFileIndex__. The 72hr wait time for deletion to be performed is a bit annoying but fortunately all my problems have been uncovered preprod. Reuse of the application ID after deletion doesn't look like it will be a problem. – Dean Mar 04 '13 at 11:33
  • 1
    I have to ammend the last comment above. Reuse of the application ID turns out to be a problem despite some general comments in the docs which lead me to believe otherwise. After an application has been successfully deleted it does not appear to be possible to reuse the ID. This is quite a shame and I've now lost my application's ID to the ether simply because I wanted to delete some test data. – Dean Mar 08 '13 at 10:27
  • Why is the appspot application id that important, surely you are using your own domain to point to it? I do agree that it should be easier to reset all data –  Mar 08 '13 at 10:37
  • 1
    Unfortunately not. The application is a backend web service and so far hasn't been fronted by proprietary domain (... shortsighted architecture I know). Lesson learned. – Dean Mar 08 '13 at 13:53
  • There probably is a way to get your id back, log a request with the google help people –  Mar 08 '13 at 14:59

1 Answers1

0

The blobstore data can be deleted (i don't recommend it since it can lead to unexpected behavior), but only if you know the table that it is saved in __BlobInfo__ and __BlobFileIndex__. This is done, so your uploaded files don't have the same name, and to accidentally replace an old file.

For a full list of tables that are stored in datastore you can run SELECT * FROM __kind__.

I am not sure why your app engine app consumes 2x you outgoing bandwidth, but i will test it myself.

An alternative is to use Google Cloud Storage. If you use the default bucket for your app engine app, you get 5GB free storage

Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.

That's true, you can either use cloud storage or datastore (cloud storage is an immutable object storing service). Blobstore was more for uploading files via <input type='file' /> forms. (since recently, writing files from inside app, has been deprecated in favor to cloud storage)

Bogdan.Nourescu
  • 905
  • 6
  • 17