I want to use s3fs based on fsspec to access files on S3. Mainly because of 2 neat features:
- local caching of files to disk with checking if files change, i.e. a file gets redownloaded if the local and remote file differ
- file version id support for versioned S3 buckets, i.e. the ability to open different versions of the same remote file based on their version id
I don't need this for high frequency use and the files don't change often. It is mainly for using unit/integration test data stored on S3, which changes only if tests and related test data get updated (versions!).
I got both of the above working separately just fine, but it seems I can't get the combination of the two working. That is, I want to be able to cache different versions of the same file locally. It seems that as soon as you use a filecache, the version id disambiguation is lost.
fs = fsspec.filesystem("filecache", target_protocol='s3', cache_storage='/tmp/aws', check_files=True, version_aware=True)
with fs.open("s3://my_bucket/my_file.txt", "r", version_id=version_id) as f:
text = f.read()
No matter what version_id
is, I always get the most recent file from S3, which is also the one that gets cached locally.
What I expect is that I always get the correct file version and the local cache either keeps separate files for each version (preferred) or just updates the local file whenever I request a version different from the cached one.
Is there a way I can achieve this with the current state of the libraries or is this currently not possible? I am using s3fs==fsspec==2022.3.0
.