Possible Duplicate:
App engine bulk loader download warning "No descending index on key, performing serial download"
My post is very similar to: App engine bulk loader download warning "No descending index on __key__, performing serial download"
I essentially want to do the same thing.
Basically, I'm using the following to download all instances of one of my kinds:
appcfg.py download_data --config_file=bulkloader.yaml --kind=ModelName --filename=ModelName.csv --application=MyAppid --url=http://MyAppid.appspot.com/remote_api
If the kind has more instances than the batch size, then I get this warning:
No descending index on __key__, performing serial download
This is causing my download of only around 6500 entities to take 471.4 seconds (according to the bulkloader tool once it completes). Which is really slow, as I have about 4 other kinds each even larger than this (around 15,000 entities)!
Also according to my Mac's Activity Monitor I'm only downloading at around 24Kb/second as shown by the bandwidth in the bulkloader output:
[INFO ] Logging to bulkloader-log-20110514.011333
[INFO ] Throttling transfers:
[INFO ] Bandwidth: 250000 bytes/second
[INFO ] HTTP connections: 8/second
[INFO ] Entities inserted/fetched/modified: 20/second
[INFO ] Batch Size: 10
My questions are:
1) How do I get rid of this warning “No descending index on __key__, performing serial download” to get parallel downloading speeds?
I think the answer to my the question is to add a descending index. Something like:
<datastore-index kind="Game" ancestor="false" source="manual">
<property name="id" direction="desc"/>
</datastore-index>
I tried adding this to datastore-indexes.xml file.
It deployed successfully, but I looked at my Datastore indices on the admin portal on Google, but I didn't see it serving or being built. Anyways, for the sake of it, I reran the command below, and it was still slow....
I also tried adding the same xml, but with the source="auto", to datastore-indexes-auto.xml file. However, when I tried deploying my eclipse complained with the following error:
java.io.IOException: Error posting to URL: https://appengine.google.com/api/datastore/index/add?app_id=<My_APP_ID>&version=1&
400 Bad Request
Creating a composite index failed: This index:
entity_type: "Game"
ancestor: false
Property {
name: "id"
direction: 2
}
is not necessary, since single-property indices are built in. Please remove it from your index file and upgrade to the latest version of the SDK, if you haven't already.
2) Does removing this warning require me to update my auto generated bulkloader.yaml? I've included the Game kind below:
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users
transformers:
- kind: Game
connector: csv
connector_options:
# TODO: Add connector options here--these are specific to each connector.
property_map:
- property: id
external_name: key
export_transform: transform.key_id_or_name_as_string
- property: __scatter__
#external_name: __scatter__
# Type: ShortBlob Stats: 56 properties of this type in this kind.
- property: genre
external_name: genre
# Type: String Stats: 6639 properties of this type in this kind.
- property: name
external_name: name
# Type: String Stats: 6639 properties of this type in this kind.
- property: releasedate
external_name: releasedate
# Type: Date/Time Stats: 6548 properties of this type in this kind.
import_transform: transform.import_date_time('%Y-%m-%dT%H:%M:%S')
export_transform: transform.export_date_time('%Y-%m-%dT%H:%M:%S')
Useful find
As I was typing this question. I found this App Engine Bulk Loader Performance
It basically explains that increasing bandwidth_limit to something reasonable and increasing rps_limit can really help speed things up.
So I tried:
appcfg.py download_data --config_file=bulkloader.yaml --kind=ModelName --filename=ModelName.csv --application=MyAppId --url=http://MyAppId.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100
Which decreased the download time to 109.8 seconds. This is a massive reduction!
However, my target is still focused on getting rid of the “No descending index on __ key__, performing serial download” for parallel downloading.
Extra information incase it might be relevant
I'm using the objectify3.0.jar to manipulate my GAE datastore. So my Game kind looks like this:
public class Game {
@Id private Long id; //This is my key, auto generated by objectify
private String name;
private String genre;
private Date releasedate;
//ommitting getters and setters
}