I'm trying to upload ~3k files (1 kilobyte each) to boto using GreenPool.
My question:
Why does get_bucket()
call takes so long per call, what causes the trade-off with set_content()
time? and how can I get around it. Thanks!
More details:
get_bucket(validate=True)
takes 30 seconds in average, and the followingset_content_from_file_name
takes under 1 sec.I tried changing to
validate=False
, this successfully reducedget_bucket()
time to under 1 sec, but then the time forset_content_from_file_name
jumped up to ~30 seconds. I couldn't find the reason for this trade-off in the boto docs.
Code:
def upload(bucket_str, key_str, file_path):
# new s3 connection
s3 = boto.connect_s3()
# get bucket
bucket_time = time.time()
b = s3.get_bucket (bucket_name, validate=True)
logging.info('get_bucket Took %f seconds'%(time.time()-bucket_time))
# get key
key_time = time.time()
key = mapping_bucket.new_key(key_str)
logging.info('new_key Took %f seconds'%(time.time()-key_time))
for i in range(S3_TRIES):
try:
up_time = time.time()
key.set_contents_from_filename (file_path,
headers={
"Content-Encoding": "gzip",
"Content-Type": "application/json",
},
policy='public-read')
logging.info('set_content Took %f seconds'%(time.time()-up_time))
key.set_acl('public-read')
return True
except Exception as e:
logging.info('try_set_content exception iteration - %d, %s'%(i, str(e)))
_e = e
raise _e