How to load data from your S3 bucket to Sagemaker jupyter notebook to train the model?

Question

I have csv files in S3 bucket, I want to use those to train model in sagemaker.

using this code but it gives an error (file not found)

import boto3
import pandas as pd
region = boto3.Session().region_name
train_data_location = 's3://taggingu-{}/train.csv'.format(region)
df=pd.read_csv(train_data_location, header = None)
print df.head

What can be the solution to this ?

Possible duplicate of [Load S3 Data into AWS SageMaker Notebook](https://stackoverflow.com/questions/48264656/load-s3-data-into-aws-sagemaker-notebook) — Hack-R, Apr 25 '19 at 16:36

score 4 · Answer 1 · answered Nov 27 '18 at 10:31

4

Not sure but could this stackoverflow answer it? Load S3 Data into AWS SageMaker Notebook

To quote @Chhoser:

import boto3
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()
bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)

pd.read_csv(data_location)

answered Nov 27 '18 at 10:31

erncyp

1,649
21
23

Thanks a lot ! should have checked the previous answers, my bad – Utsav Shukla Nov 27 '18 at 11:09

Theofilos Papapanagiotou · Answer 2 · 2023-01-12T23:51:38.450

0

You can use AWS SDK for Pandas, a library that extends Pandas to work smoothly with AWS data stores.

import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")

Most notebook kernels have it, if missing it can be installed via pip install awswrangler.

edited Jan 12 '23 at 23:51

answered Jan 12 '23 at 23:26

Theofilos Papapanagiotou

5,133
1
18
24

How to load data from your S3 bucket to Sagemaker jupyter notebook to train the model?

2 Answers2