1

In the AWS-SDK, it lists ruby code of the form:

 s3 = AWS::S3.new
 bucket = s3.buckets[bucket_name]
 tree = bucket.as_tree(:prefix => 'myshop/products')
 directories = tree.children.select(&:branch?).collect(&:prefix)

fail error: "Unable to find marker in S3 list objects response"

Structure of directory

/myshop/products/1474472/original.jpg
/myshop/products/1474472/small.jpg
/myshop/products/1474472/mini.jpg
/myshop/products/1333333/original.jpg
/myshop/products/1333333/small.jpg
/myshop/products/1333333/mini.jpg

...

more of 100 000 obj

I want to verify that the directory(for example "1474472") was created

my plan: aws-s3-list-> ruby-array->find in array (array.include?)

!!!need very fast method - soon the end of the world :)

jrb
  • 578
  • 2
  • 10
memoris
  • 149
  • 2
  • 12
  • I'm not familiar with the Ruby SDK, but S3 only allows you to list 1000 objects at a time, so listing 100,000 objects is going to result in at least 100 HTTP requests. If you want to check for the existence of a particular object then sending a HEAD request for that object is the best way. It sounds like you want to check that one or more files match a given prefix, can you not just adapt your existing prefix search to include the sub dir name? – robert_b_clarke Nov 03 '12 at 12:21
  • hi, aws ping take 288 ms - 0.3 sec х 10000= 3000sec = 50min, i`ts very long. – memoris Nov 03 '12 at 22:37

3 Answers3

0

There is no such stuff as folders in Amazon S3. It is a "flat" file system. Have a look into this answer.

What you really are looking for is verifying whether a given prefix ("/myshop/products/1474472", for instance) exists in your bucket.
Their REST API definitely supports it, have a look into the documentation. You need to list the keys (which would be the "file names") matching a given prefix, that can be passed as parameter. You can also optimize your call by setting the max-keys parameter to 1. That way, if you receive any non-zero amount of items in the response, the bucket already contains files with names starting with the given prefix.

Community
  • 1
  • 1
Viccari
  • 9,029
  • 4
  • 43
  • 77
0
aws s3 cp s3://bucket/tmp/foo/ . --recursive --exclude "*" --include "*1474472" 

https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters

4b0
  • 21,981
  • 30
  • 95
  • 142
Saurav Bhowmick
  • 308
  • 4
  • 16
0

Ideal way is to maintain the list through your Application while writing to S3. EMRFS does the same by storing the details on Dynamo DB.

Use the list for generating Manifest for e.g. for S3Distcp. In this way we can avoid hitting S3 to list, which is a costly operation.

Saurav Bhowmick
  • 308
  • 4
  • 16