Questions tagged [python-s3fs]

For questions related to the Python s3fs library

Use this tag for questions related to the Python s3fs library.

Not to be confused with the tag, which is for mounting an s3fs bucket on a local mount point and has nothing to do with Python.

85 questions
63
votes
5 answers

How to read partitioned parquet files from S3 using pyarrow in python

I looking for ways to read data from multiple partitioned directories from s3 using…
stormfield
  • 1,696
  • 1
  • 14
  • 26
19
votes
3 answers

Overwrite parquet file with pyarrow in S3

I'm trying to overwrite my parquet files with pyarrow that are in S3. I've seen the documentacion and I haven't found anything. Here is my code: from s3fs.core import S3FileSystem import pyarrow as pa import pyarrow.parquet as pq s3 =…
Mateo Rod
  • 544
  • 2
  • 6
  • 14
14
votes
1 answer

s3fs suddenly stopped working in Google Colab with error "AttributeError: module 'aiobotocore' has no attribute 'AioSession'"

Yesterday the following cell sequence in Google Colab would work. (I am using colab-env to import environment variables from Google Drive.) This morning, when I run the same code, I get the following error. It appears to be a new issue with s3fs…
Andrew Fogg
  • 645
  • 1
  • 8
  • 16
13
votes
1 answer

How to read parquet file from s3 using dask with specific AWS profile

How to read a parquet file on s3 using dask and specific AWS profile (stored in a credentials file). Dask uses s3fs which uses boto. This is what I have tried: >>>import os >>>import s3fs >>>import boto3 >>>import dask.dataframe as…
muon
  • 12,821
  • 11
  • 69
  • 88
12
votes
1 answer

S3FS python, credential inline

I am trying to use python s3fs to read files in S3 AWS. I could not find the code to put credential (Access key + Secret) into s3fs code. Can anyone please help me how to set this info along with s3fs code. import s3fs fs =…
raju
  • 6,448
  • 24
  • 80
  • 163
11
votes
4 answers
9
votes
2 answers

download file using s3fs

I am trying to download a csv file from an s3 bucket using the s3fs library. I have noticed that writing a new csv using pandas has altered data in some way. So I want to download the file directly in its raw state. The documentation has a download…
Jacky
  • 710
  • 2
  • 8
  • 27
8
votes
5 answers

Pandas read_csv specify AWS Profile

Pandas (v1.0.5) use s3fs library to connect with AWS S3 and read data. By default, s3fs uses the credentials found in ~/.aws/credentials file in default profile. How do I specify which profile should pandas use while reading a CSV from…
Spandan Brahmbhatt
  • 3,774
  • 6
  • 24
  • 36
7
votes
2 answers

s3fs gzip compression on pandas dataframe

I'm trying to write a dataframe as a CSV file on S3 by using the s3fs library and pandas. Despite the documentation, I'm afraid the gzip compression parameter it's not working with s3fs. def DfTos3Csv (df,file): with fs.open(file,'wb') as f: …
Julián Gómez
  • 351
  • 1
  • 3
  • 11
5
votes
3 answers

aiobotocore - ImportError: cannot import name 'InvalidIMDSEndpointError'

The code below raise an import exception import s3fs fs = s3fs.S3FileSystem(anon=False) Exception Traceback (most recent call last): File "issue.py", line 1, in import s3fs File…
balderman
  • 22,927
  • 7
  • 34
  • 52
4
votes
1 answer

use AWS_PROFILE in pandas.read_parquet

I'm testing this locally where I have a ~/.aws/config file. ~/.aws/config looks some thing like: [profile a] ... [profile b] ... I also have a AWS_PROFILE environmental variable set as "a". I would like to read a file in which is accessible with…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45
4
votes
2 answers

cannot import s3fs in pyspark

When i try importing the s3fs library in pyspark using the following code: import s3fs I get the following error: An error was encountered: cannot import name 'maybe_sync' from 'fsspec.asyn'…
thentangler
  • 1,048
  • 2
  • 12
  • 38
4
votes
1 answer

Profile argument in python s3fs

I'm trying to use s3fs in python to connect to an s3 bucket. The associated credentials are saved in a profile called 'pete' in…
Pete M
  • 154
  • 3
  • 8
4
votes
1 answer

Load CSV file into Pandas from s3 using chunksize

I'm trying to read a very big file from s3 using... import pandas as pd import s3fs df = pd.read_csv('s3://bucket-name/filename', chunksize=100000) But even after giving the chunk size it is taking for ever. Does the chunksize option work when…
Xion
  • 319
  • 2
  • 11
4
votes
1 answer

pytest How to mock s3fs.S3FileSystem open file

I am trying to mockup the call to open a file in a S3 bucket. The code that I have is: # mymodule.py import s3fs #... def __init__(self): self.s3_filesystem = s3fs.S3FileSystem(anon=False, key=s3_key, …
1
2 3 4 5 6