7

Possible Duplicate:
App engine bulk loader download warning "No descending index on key, performing serial download"

My post is very similar to: App engine bulk loader download warning "No descending index on __key__, performing serial download"

I essentially want to do the same thing.

Basically, I'm using the following to download all instances of one of my kinds:

appcfg.py download_data --config_file=bulkloader.yaml --kind=ModelName --filename=ModelName.csv --application=MyAppid --url=http://MyAppid.appspot.com/remote_api

If the kind has more instances than the batch size, then I get this warning:

No descending index on __key__, performing serial download

This is causing my download of only around 6500 entities to take 471.4 seconds (according to the bulkloader tool once it completes). Which is really slow, as I have about 4 other kinds each even larger than this (around 15,000 entities)!

Also according to my Mac's Activity Monitor I'm only downloading at around 24Kb/second as shown by the bandwidth in the bulkloader output:

[INFO    ] Logging to bulkloader-log-20110514.011333
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 20/second
[INFO    ] Batch Size: 10

My questions are:

1) How do I get rid of this warning “No descending index on __key__, performing serial download” to get parallel downloading speeds?

I think the answer to my the question is to add a descending index. Something like:

<datastore-index kind="Game" ancestor="false" source="manual">
    <property name="id" direction="desc"/>
</datastore-index>

I tried adding this to datastore-indexes.xml file.

It deployed successfully, but I looked at my Datastore indices on the admin portal on Google, but I didn't see it serving or being built. Anyways, for the sake of it, I reran the command below, and it was still slow....

I also tried adding the same xml, but with the source="auto", to datastore-indexes-auto.xml file. However, when I tried deploying my eclipse complained with the following error:

java.io.IOException: Error posting to URL: https://appengine.google.com/api/datastore/index/add?app_id=<My_APP_ID>&version=1&
400 Bad Request
Creating a composite index failed: This index:
entity_type: "Game"
ancestor: false
Property {
 name: "id"
 direction: 2
}

is not necessary, since single-property indices are built in. Please remove it from your index file and upgrade to the latest version of the SDK, if you haven't already.

2) Does removing this warning require me to update my auto generated bulkloader.yaml? I've included the Game kind below:

python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Game
  connector: csv
  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
  property_map:
    - property: id
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: __scatter__
      #external_name: __scatter__
      # Type: ShortBlob Stats: 56 properties of this type in this kind.

    - property: genre
      external_name: genre
      # Type: String Stats: 6639 properties of this type in this kind.

    - property: name
      external_name: name
      # Type: String Stats: 6639 properties of this type in this kind.

    - property: releasedate
      external_name: releasedate
      # Type: Date/Time Stats: 6548 properties of this type in this kind.
      import_transform: transform.import_date_time('%Y-%m-%dT%H:%M:%S')
      export_transform: transform.export_date_time('%Y-%m-%dT%H:%M:%S')

Useful find

As I was typing this question. I found this App Engine Bulk Loader Performance

It basically explains that increasing bandwidth_limit to something reasonable and increasing rps_limit can really help speed things up.

So I tried:

appcfg.py download_data --config_file=bulkloader.yaml --kind=ModelName --filename=ModelName.csv --application=MyAppId --url=http://MyAppId.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100

Which decreased the download time to 109.8 seconds. This is a massive reduction!

However, my target is still focused on getting rid of the “No descending index on __ key__, performing serial download” for parallel downloading.


Extra information incase it might be relevant

I'm using the objectify3.0.jar to manipulate my GAE datastore. So my Game kind looks like this:

public class Game {
    @Id private Long id; //This is my key, auto generated by objectify  
    private String name;
    private String genre; 
    private Date releasedate;

    //ommitting getters and setters 
}
Community
  • 1
  • 1
Stewie
  • 185
  • 11
  • 1
    Your question isn't similar to that other one, it's identical. Did you try adding a descending index on `__key__`, as suggested in my answer, instead of on `id` (which isn't a reserved property, and so won't work)? – Nick Johnson May 15 '11 at 01:35
  • I added the following xml to datastore-indexes.xml and deployed it to google: ` ` I'm still getting the warning. – Stewie May 15 '11 at 08:18
  • 1
    There's a space in your property name. It's `__key__`, not `__ key__`. – Nick Johnson May 15 '11 at 08:27
  • Thanks Nick for the help. I think I have it working now. This is what I tried. I added the same xml but with source='auto' to (shown below) datastore-indexes-auto.xml in the war/WEB-INF-appengine-generated/ folder and redepolyed it to google. ` ` I then went to the Datastore Indexes tab in my Admin Dashboard and saw __key__ built with desc. So why does my app require this xml to appear in both datastore-indexes.xml and datastore-indexes-auto.xml for indices to build? – Stewie May 15 '11 at 08:30
  • Btw, I also reverted my bulkloader.yaml key property from `id` to `__key__` (shown below) because my key column in my csv files were blank. `- kind: Game connector: csv connector_options: # TODO: Add connector options here--these are specific to each connector. property_map: - property: __key__ external_name: key export_transform: transform.key_id_or_name_as_string` Also, how do I mark your comment as the answer? – Stewie May 15 '11 at 08:38
  • The XML doesn't have to be in both files - either one would work, and you should add it to only the non-auto one. Don't worry about marking an answer - this question should be closed in favor of the other one. – Nick Johnson May 15 '11 at 17:57

0 Answers0