2

I am trying to check if a file exists on s3 through Rstudio on Amazon EC2 instance. Base R's exists() function and file.exists() functions are returning FALSE for every file. Following is my code, exists.type exists in s3 and not_exists.type does not exist.

library("aws.s3")
Sys.setenv("AWS_ACCESS_KEY_ID" = "key1",
           "AWS_SECRET_ACCESS_KEY" = "key2",
           "AWS_DEFAULT_REGION" = "key3"
)

existing_file_path = "s3://bucket_name/folder_name/exists.type"
not_existing_file_path = "s3://bucket_name/folder_name/not_exists.type"

exists(existing_file_path) #returns FALSE
exists(not_existing_file_path) #returns FALSE

file.exists(existing_file_path) returns FALSE
file.exists(not_existing_file_path) returns FALSE

aws.s3::get_object(existing_file_path) #reads the entire file
aws.s3::get_object(not_existing_file_path) #gives error

I tried list.files also, it returns character(0).

penguin
  • 1,267
  • 14
  • 27

2 Answers2

3

You should use the function head_object(), which only returns metadata about your object without returning the object itself.

aws.s3::head_object("your_file", bucket = "your_bucket")
mtoto
  • 23,919
  • 4
  • 58
  • 71
2

You should use aws.s3::head_object, but be careful with misleading return values. aws.s3::head_object will return FALSE when a connection couldn't be established as well as FALSE when a connection could be established, but the file couldn't be located.

I'd suggest you include some logic to determine whether the FALSE value that was returned was truly a missing file, or a potentially present file in an S3 bucket you can't access

obj_metadata <- aws.s3::head_object(object = "exists.type", bucket = "s3://bucket_name/")

if (!is.null(attr(obj_metadata, "connection")) && attr(obj_metadata, "connection") == "close") {
  stop("Could not establish connection with AWS S3")
}

Explanation

The returned attributes for each of the three common return values are quite different from one another. For example, a True return value comes with attributes like:

  • $`last-modified`
  • $etag
  • $`x-amz-server-side-encryption`
  • $`accept-ranges`
  • $`content-length`

While an invalid connection is the only return value with this attribute:

$connection
[1] "close"

I'll note as well that a valid False return value (when you establish a connection and the file doesn't exist) contains a subset of the True return attributes.

So assuming you want to halt a program when a connection couldn't be established, I'd recommend to first check that the $connection attribute exists with

!is.null(attr(obj_metadata, "connection"))

Then to be extra safe, may as well check that the value of said connection is "close" with:

attr(not_exists, "connection") == "close"

Note I use && instead of & because it's a short circuit operation given a previous FALSE value. The second conditional statement on an existing file would return logical(0), thus raising an error.

For reference, Python's top voted answer for the same question uses a similar intuition

Jamie.Sgro
  • 821
  • 1
  • 5
  • 17