3

for my project I am using GAE datastore to store data. For backup purpose I decided to use the bulkloader - which downloads the whole data perfectly in a csv file. Also the upload uploads the data fine without errors.

My problem is, that the upload do not update the existing data but creates duplicates. Here an example from the datastore viewer:

Before update:

ID/Name 
id=18000
id=20001 

After update:

ID/Name
id=18000
id=20001
name=18000
name=20001 

In the datastore entity I am using this as an data id:

@PrimaryKey 
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) 
private Long id; 

Any Idea how I can actually update existing data with bulkloader?

Thanks, Adam

Adam
  • 31
  • 1

1 Answers1

3

bulkloader's default settings have a nasty habit of stomping the types of values. This is especially a problem for keys and lists.

I use these helper functions bulk_helper, and add this to my bulkloader.yaml:

python_preamble:
- import: bulk_helper
...

property_map:
  - property: __key__
    external_name: key
    import_transform: bulk_helper.reverse_str_to_key
    export_transform: bulk_helper.key_to_reverse_str

This preserves the full key, including kind and parent info, and keeps it human readable (if that's important to you).

Calvin
  • 4,177
  • 1
  • 16
  • 17
  • 1
    the bulk_helper link doesn't work? but I think I found it here: https://github.com/dahool/app-utils/blob/master/ipdb/bulk_helper.py – slashdottir Feb 20 '13 at 19:40