Questions tagged [fsspec]

Filesystem interfaces for python

A low level IO library used by many pydata packages. See the docs https://filesystem-spec.readthedocs.io/en/latest/

39 questions
8
votes
2 answers

dvc (data version control) error - ImportError: cannot import name 'fsspec_loop' from 'fsspec.asyn'

I use Python version 3.7.13 and create a virtual environment (venv) for a MLOps project. A dvc package (=2.10.2) that is compatible with Python== 3.7.13 is installed in this venv. (venv) (base) tony3@Tonys-MacBook-Pro mlops % dvc…
Tony Peng
  • 579
  • 5
  • 10
4
votes
2 answers

What is the equivalent of connecting to google cloud storage(gcs) like in aws s3 using s3fs?

I want to access google cloud storage as in the code below. # amazon s3 connection import s3fs as fs with fs.open("s3://mybucket/image1.jpg") as f: image = Image.open(f).convert("RGB") # Is there an equivalent code like this GCP side? with…
4
votes
1 answer

use AWS_PROFILE in pandas.read_parquet

I'm testing this locally where I have a ~/.aws/config file. ~/.aws/config looks some thing like: [profile a] ... [profile b] ... I also have a AWS_PROFILE environmental variable set as "a". I would like to read a file in which is accessible with…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45
3
votes
1 answer

Driver error reading file geodatabase from S3 using geopandas

I'm trying to read a file geodatabase file into a geodataframe using the geopandas python library. The geodatabase file is on S3, so I'm using fssspec to read it in, but I'm getting an error: import geopandas as gpd import fsspec fs =…
j sad
  • 1,055
  • 9
  • 16
3
votes
1 answer

How can I speed up reading a CSV/Parquet file from adl:// with fsspec+adlfs?

I have a several gigabyte CSV file residing in Azure Data Lake. Using Dask, I can read this file in under a minute as follows: >>> import dask.dataframe as dd >>> adl_path = 'adl://...' >>> df = dd.read_csv(adl_path, storage_options={...}) >>>…
user655321
  • 1,572
  • 2
  • 16
  • 33
2
votes
1 answer

Read/write partitioned parquet from/to SFTP server with pyarrow

recently I got myself into data analysis with some friends and to improve our data exchange we got a linux server which we use as a SFTP server. Following this we no longer want to write outputs to our local filesystem and then move it to the SFTP…
Drax
  • 23
  • 1
  • 1
  • 4
2
votes
1 answer

How to distinguish between same-named files and directories in Google Drive using fsspec in Python?

I am working with Google Drive in Python using fsspec to perform various operations like listing and downloading files and directories. However, I have encountered a challenge when dealing with items that share the same name. For example, there…
muhammad ali e
  • 655
  • 6
  • 8
2
votes
1 answer

Initialize fsspec DirFileSystem from a URL

I want to initalize a fsspec filesystem based on a URL - both the protocol and the root directory. E.g. I could create a filesystem from gcs://my-bucket/prefix that would use my-bucket on GCS, or file:///tmp/test that would use the /tmp/test…
2
votes
1 answer

Prevent any file system usage in Python's pytest

I have a program that, for data security reasons, should never persist anything to local storage if deployed in the cloud. Instead, any input / output needs to be written to the connected (encrypted) storage instead. To allow deployment locally as…
Thomas
  • 4,696
  • 5
  • 36
  • 71
2
votes
2 answers

How to get the parent directory with fsspec?

Given a filepath, how do I obtain the parent directory containing the file using fsspec ? Filepath can be using local filesystem or cloud storage, that's why fsspec is prefered.
2
votes
1 answer

Reading xarray goes16 data directly from S3 without downloading into the system

Reading xarray goes16 data directly from S3 without downloading into the system. the issue is that I cannot concatenate S3Files. I am recalling 24 files from S3 and want to read and extract the data for these files for the time range: This is the…
Naj_m_Om
  • 33
  • 6
2
votes
1 answer

s3fs local filecache of versioned flies

I want to use s3fs based on fsspec to access files on S3. Mainly because of 2 neat features: local caching of files to disk with checking if files change, i.e. a file gets redownloaded if the local and remote file differ file version id support for…
2
votes
0 answers

Azure databricks : pandas.read_parquet error

I have some error to read parquet into pandas in databricks like in the following : anyone has an idea ?following is my databricks runtime. my pandas version
mytabi
  • 639
  • 2
  • 12
  • 28
2
votes
1 answer

open remote zarr store with many groups and keep coordinates using xarray

I would like to read into the remote zarr store of https://hrrrzarr.s3.amazonaws.com/index.html#sfc/20210208/20210208_00z_anl.zarr/. Info of the zarr store is at https://mesowest.utah.edu/html/hrrr/zarr_documentation/zarrFileVariables.html I am able…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45
2
votes
1 answer

open_mfdataset() on remote zarr store giving zarr.errors.GroupNotFoundError

I'm looking to read a remote zarr store using xarray.open_mfdataset() I'm getting a zarr.errors.GroupNotFoundError: group not found at path ''. Traceback at the bottom. import xarray as xr import s3fs fs = s3fs.S3FileSystem(anon=True) uri =…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45
1
2 3