1

following the answers to this question Load S3 Data into AWS SageMaker Notebook I tried to load data from S3 bucket to SageMaker Jupyter Notebook.

I used this code:

import pandas as pd

bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)

pd.read_csv(data_location)

I replaced 'my-bucket' by the ARN (Amazon Ressource name) of my S3 bucket (e.g. "arn:aws:s3:::name-of-bucket") and replaced 'train.csv' by the csv-filename which is stored in the S3 bucket. Regarding the rest I did not change anything at all. What I got was this ValueError:

ValueError: Failed to head path 'arn:aws:s3:::name-of-bucket/name_of_file_V1.csv': Parameter validation failed:
Invalid bucket name "arn:aws:s3:::name-of-bucket": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

What did I do wrong? Do I have to modify the name of my S3 bucket?

Tobitor
  • 1,388
  • 1
  • 23
  • 58
  • I found it: I just had to replace `my-bucket` by `name-of-bucket` without the complete ARN, so without `arn:aws:s3:::`. :-D – Tobitor Feb 17 '21 at 10:39

1 Answers1

1

The path should be:

data_location = 's3://{}/{}'.format(bucket, data_key)

where bucket is <bucket-name> not ARN. For example bucket=my-bucket-333222.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • Does each sagemaker session has its own default bucket or all sessions share same default bucket? Also if later is true, then do all notebook instances have same default bucket? – Neo Jun 30 '23 at 21:22