9

I am testing the new app engine search api for java and I have the following code that tries to add ~3000 documents on an index:

List<Document> documents = new ArrayList<Document>();
    for (FacebookAlbum album: user.listAllAlbums()) {
        Document doc = Document.newBuilder()
                .setId(album.getId())
                .addField(Field.newBuilder().setName("name").setText(album.getFullName()))
                .addField(Field.newBuilder().setName("albumId").setText(album.getAlbumId()))
                .addField(Field.newBuilder().setName("createdTime").setDate(Field.date(album.getCreatedTime())))
                .addField(Field.newBuilder().setName("updatedTime").setDate(Field.date(album.getUpdatedTime())))
                .build();
        documents.add(doc);
    }     

    try {
        // Add all the documents.
        getIndex(facebookId).add(documents);
    } catch (AddException e) {
        if (StatusCode.TRANSIENT_ERROR.equals(e.getOperationResult().getCode())) {
            // retry adding document
        }
    }

However, I am getting the following exception:

Uncaught exception from servlet
java.lang.IllegalArgumentException: number of documents, 3433, exceeds maximum 200
at com.google.appengine.api.search.IndexImpl.addAsync(IndexImpl.java:196)
at com.google.appengine.api.search.IndexImpl.add(IndexImpl.java:380)
at photomemories.buildIndexServlet.doGet(buildIndexServlet.java:47)

Is there a quota on the number of documents I can insert with an add call set to 200?

If I try to insert one document at a time to the index with the following code:

 for (FacebookAlbum album: user.listAllAlbums()) {
        Document doc = Document.newBuilder()
                .setId(album.getId())
                .addField(Field.newBuilder().setName("name").setText(album.getFullName()))
                .addField(Field.newBuilder().setName("albumId").setText(album.getAlbumId()))
                .addField(Field.newBuilder().setName("createdTime").setDate(Field.date(album.getCreatedTime())))
                .addField(Field.newBuilder().setName("updatedTime").setDate(Field.date(album.getUpdatedTime())))
                .build();

         try {
            // Add the document.
            getIndex(facebookId).add(doc);
        } catch (AddException e) {
            if (StatusCode.TRANSIENT_ERROR.equals(e.getOperationResult().getCode())) {
                // retry adding document
            }
        }

    }     

I am getting the following exception:

com.google.apphosting.api.ApiProxy$OverQuotaException: The API call search.IndexDocument() required more quota than is available.
at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.success(ApiProxyImpl.java:479)
at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.success(ApiProxyImpl.java:382)
at com.google.net.rpc3.client.RpcStub$RpcCallbackDispatcher$1.runInContext(RpcStub.java:786)
at com.google.tracing.TraceContext$TraceContextRunnable$1.run(TraceContext.java:455)

I thought the quota on the api calls was 20k/day (see here: https://developers.google.com/appengine/docs/java/search/overview#Quotas).

Any ideas on what is going on ?

RLH
  • 15,230
  • 22
  • 98
  • 182

3 Answers3

8

There are a few things going on here. Most importantly, and this is something that will be clarified in the documentation very soon, the Search API Call quota also accounts for the number of documents being added/updated. So a single Add call that inserts 10 documents will reduce your daily Search API Call quota by 10.

Yes, the maximum number of documents that may be indexed in a single add call is 200. However, at this stage there is also a short term burst quota in place that limits you to about 100 API calls per minute.

All the above means that, for now at least, it's safest to not add more than 100 documents per Add request. Doing so via Task Queue as recommended by Shay is also a very good idea.

Peter McKenzie
  • 741
  • 3
  • 6
  • Thanks Peter! Adding ~3k documents was achieved by calling add with one document at a time and having a task queue with a rate limit of 2/s - the rate of the default queue (5/s) was hitting the burst quota. So effectively the burst rate limit is >=120 API calls per minute. – Ioannis Antonellis May 14 '12 at 04:42
  • Question: Is there then a benefit (faster?) on calling add with many documents vs many calls to add with a document at a time ? – Ioannis Antonellis May 14 '12 at 04:45
  • Batching several documents into a single add call is a little more efficient. – Peter McKenzie May 14 '12 at 21:29
  • We recently removed the per minute (burst) Search quota for Free Apps, so you can now use your 20K Search API Call quota as fast as you like. – Peter McKenzie Jul 16 '12 at 01:27
3

I think (can't find a validation for it) that there is a per minute quota limit, you should index your documents using a queue to make sure you gradually index them.

Shay Erlichmen
  • 31,691
  • 7
  • 68
  • 87
1

Docs mention a per minute quota also, 20k is only 13.9 per minute.

https://developers.google.com/appengine/docs/quotas