26

What is the best/fastest way to check if an Entity exists in a google-app-engine datastore? For now I'm trying to get the entity by key and checking if the get() returns an error.

I don't know the process of getting an Entity on the datastore. Is there a faster way for doing only this check?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
Victor
  • 8,309
  • 14
  • 80
  • 129
  • getting an entity by key will never return an error, it returns None. – aschmid00 Apr 16 '12 at 14:41
  • 5
    In java `get` throws an exception when entity is not found: https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/DatastoreService#get(com.google.appengine.api.datastore.Key) – Peter Knego Apr 16 '12 at 14:47
  • 1
    I don't really get the question. If you already have a key, how come your entity might not exist? Unless you're using custom ID/name or another concurrent request deletes that same entity meanwhile. What I'm saying is, you might wanna look at your problem from a different prospective, e.g. what *exactly* are you trying to accomplish on a bigger picture scale? – alex Apr 16 '12 at 15:40
  • I'm generating the key according with some parameters. For examples, if I have a user, I can create an Entity for user properties using the user_id as key. only using KeyFactory.createKey(EntityName, id); – Victor Apr 16 '12 at 16:14

3 Answers3

6

What you proposed would indeed be the fastest way to know if your entity exists. The only thing slowing you down is the time it takes to fetch and deserialize your entity. If your entity is large, this can slow you down.

IF this action (checking for existence) is a major bottleneck for you and you have large entities, you may want to roll your own system of checking by using two entities - first you would have your existing entity with data, and a second entity that either stores the reference to the real entity, or perhaps an empty entity where the key is just a variation on the original entity key that you can compute. You can check for existence quickly using the 2nd entity, and then fetch the first entity only if the data is necessary.

The better way I think would just be to design your keys such they you know there would not be duplicates, or that your operations are idempotent, so that even if an old entity was overwritten, it wouldn't matter.

dragonx
  • 14,963
  • 27
  • 44
  • I don't think there is a way to avoid this check when several processes run in parallel. The only way to ensure no duplicates is to use a transaction: check if an entity already exists, if not - create a new entity. – Andrei Volgin Nov 22 '13 at 05:13
  • You probably didn't understand my answer. I said pretty much the same, thing. And then, I added an optimization in the case where you have large entities and you don't want to deserialize the large entity. You could have a second small entity to check for existence so that it returns much faster than fetching the large entity. But you're write, you'd have to write those two entities in a transaction. – dragonx Nov 22 '13 at 15:29
5

com.google.appengine.api has been deprecated in favor of the App Engine GCS client.

Have you considered using a query? Guess-and-check is not a scalable way to find out of an entity exists in a data store. A query can be created to retrieve entities from the datastore that meet a specified set of conditions:

https://developers.google.com/appengine/docs/java/datastore/queries

EDIT:

What about the key-only query? Key-only queries run faster than queries that return complete entities. To return only the keys, use the Query.setKeysOnly() method.

new Query("Kind").addFilter(Entity.KEY_RESERVED_PROPERTY, FilterOperator.EQUAL, key).setKeysOnly();

Source: [1]: http://groups.google.com/group/google-appengine-java/browse_thread/thread/b1d1bb69f0635d46/0e2ba938fad3a543?pli=1

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
  • 1
    The time response for doing a query is better for fetching an Entity? – Victor Apr 16 '12 at 14:42
  • 13
    No, query will always take longer and cost more, so I don't think this is the right answer. – Peter Knego Apr 16 '12 at 14:44
  • 3
    The cost for a query with the .setKeysOnly() filter has actually the same cost as a single get operation (both in terms terms of accounting and effective runtime since this is mostly determined by network roundtrip time). Also note that the Query for KEY_RESERVED_PROPERTY is treated in a special manner as it does not use an eventual consistent index as a query for any other attribute would, but is strongly consistent instead. This makes this answer perfectly valid, although the savings by fetching the keys only are not specified for this type of query. – Ext3h Apr 08 '14 at 11:42
3

You could fetch using a List<Key> containing only one Key, that method returns a Map<Key, Entity> which you can check if it contains an actual value or null, for example:

Entity e = datastoreService.get(Arrays.asList(key)).get(key);

In general though I think it'd be easier to wrap the get() in a try/catch that returns null if the EntityNotFoundException is caught.

Jason Hall
  • 20,632
  • 4
  • 50
  • 57
  • Why this is best by only using get(Key)? – Victor Apr 16 '12 at 14:57
  • It just seems more "correct" to me if you're only going to be fetching for one Key -- but it's entirely personal preference, and up to you. – Jason Hall Apr 16 '12 at 14:59
  • I'm asking better based in response time – Victor Apr 16 '12 at 15:00
  • The get() does throws an EntityNotFoundException if not found. Java code that catches exceptions during normal operation can be an order of magnitude slower and is bad form, because Exceptions should be reserved for things the programmer did not expect to happen. http://stackoverflow.com/questions/299068/how-slow-are-java-exceptions – Eric Leschinski Apr 16 '12 at 15:02
  • I don't have any hard data but I'd expect them to be the same. – Jason Hall Apr 16 '12 at 15:03
  • @EricLeschinski The JavaDocs don't mention throwing that exception, but I admit I didn't try it myself. – Jason Hall Apr 16 '12 at 15:06