2

I want to read a json file from S3 into a sagemaker notebook.

I can do this with pandas with this code, and this works without error :

import json
import pandas as pd
import boto3


prefix_source = 'folder'

s3 = boto3.resource('s3')
my_bucket_source = s3.Bucket('bucket_source')

for obj in my_bucket_source.objects.filter(Prefix=prefix_source):
        data_location = 's3://{}/{}'.format(obj.bucket_name, obj.key)
        data = pd.read_json(data_location, lines = True )
        display(data.head())

but I don't want to use pandas, I want to use Python

I tried this code

for obj in my_bucket_source.objects.filter(Prefix=prefix_source):
        data_location = 's3://{}/{}'.format(obj.bucket_name, obj.key)
        with open(data_location, 'r') as f:
            array = json.load(f)
            display(array) 

I got this error :

IOError: [Errno 2] No such file or directory

Brigitte Maillère
  • 847
  • 1
  • 9
  • 27

1 Answers1

2

Json.load() expect a local file system path "/...", not an "s3://" URI.
See answer here: https://stackoverflow.com/a/47121263

Gili Nachum
  • 5,288
  • 4
  • 31
  • 33