0

So I have an S3 bucket and inside it I have some hello.txt.gz file. So is there any way to get the file contentsand put it into pandas df? So far I get the file name but I am confused how to read the file contents and move it to a dataframe?

import sys
import os
from boto3.session import Session
import boto3
from botocore.client import Config
import gzip
import pandas as pd

s3_bucket='bucket_name'
path='folder/subfolder/'#
file1='file' #name of a file in the subfolder which data contents I need
sessionM=Session() 
s3=sessionM.resource('s3')
clients3=boto3.client('s3')
def get_files():   
    response = clients3.list_objects(Bucket=s3_bucket, Prefix=path, Delimiter="/")
    for obj in response['Contents']:
        files=obj['Key']
        print(files)
if __name__ == '__main__':
    get_files()
Elmira
  • 11
  • 2
  • Does this answer your question? [How to import a text file on AWS S3 into pandas without writing to disk](https://stackoverflow.com/questions/37703634/how-to-import-a-text-file-on-aws-s3-into-pandas-without-writing-to-disk) – Michael Delgado Jun 02 '20 at 06:31
  • Specifically, you can just do `pd.read_csv('s3://bucket-name/folder/subfolder/hello.txt.gz', compression='gz')`. See [this answer](https://stackoverflow.com/a/51777553/3888719) for setup & auth details. – Michael Delgado Jun 02 '20 at 06:32

0 Answers0