Reading xarray goes16 data directly from S3 without downloading into the system

Question

Reading xarray goes16 data directly from S3 without downloading into the system. the issue is that I cannot concatenate S3Files. I am recalling 24 files from S3 and want to read and extract the data for these files for the time range:

This is the code:

import datetime as dt
import xarray as xr
import fsspec
import s3fs

fs = fsspec.filesystem('s3', anon=True)

urls1=[]

for i in range (2):
    urls = [
        's3://' + f
        for f in fs.glob(f"s3://noaa-goes16/ABI-L2ACMC/2022/001/{i:02}/*.nc")
    ]
    urls1 = urls1+ urls

with fs.open(urls1[0]) as fileObj:
    ds = xr.open_dataset(fileObj, engine='h5netcdf')

however, i run into the issue I/O operation on closed file.

Michael Delgado · Answer 1 · 2022-11-08T01:18:52.457

1

Similarly to most file object interfaces in python, opening a file-like object with a context manager closes the file on exit. So in the following example:

# use fs.open to create an S3File object
with fs.open(urls1[0], mode="rb") as fileObj:
    # open the netcdf for reading, but don't load the data - instead, just
    # establish a lazy-load connection to the underlying S3File object
    ds = xr.open_dataset(fileObj, engine='h5netcdf')

# <--
# exit the context, thereby closing the S3File object

# attempt to access the data again, after the stream is closed
ds.load()  # raises IOError

Instead, you should either load all the data within the context manager:

with fs.open(urls1[0], mode="rb") as fileObj:
    with xr.open_dataset(fileObj, engine='h5netcdf') as ds:
        ds = ds.load()

Or, if you're planning to use the dataset in later code without loading:

fileObj = fs.open(urls1[0], mode="rb")
ds = xr.open_dataset(fileObj, engine='h5netcdf')

# other data operations

# be sure to close the connections when you're done
ds.close()
fileObj.close()

edited Nov 08 '22 at 01:18

answered Jun 11 '22 at 21:31

Michael Delgado

13,789
3
29
54

Thank you @Michael. It seems that does not support netcdf4. This is the error: ValueError: can only read bytes or file-like objects with engine='scipy' or 'h5netcdf' – Naj_m_Om Nov 08 '22 at 01:17
you're right - you need to use a read mode, e.g. `mode="rb"`. I've updated my answer. in my defense I copied that part of the code from you ;) hope that helps! – Michael Delgado Nov 08 '22 at 01:19
That is great!! happy that helped you:) thank you so much for the reply. I updated as you mentioned with the reading mode, but it still gives an error: 549 filename_or_obj = _normalize_path(filename_or_obj) --> 550 store = NetCDF4DataStore.open( 551 filename_or_obj, 552 mode=mode, 553 format=format, 554 group=group, 555 clobber=clobber, 556 diskless=diskless, 557 persist=persist, 558 lock=lock, 559 autoclose=autoclose, 560 – Naj_m_Om Nov 08 '22 at 01:41
it still gives the same error: ValueError: can only read bytes or file-like objects with engine='scipy' or 'h5netcdf' – Naj_m_Om Nov 08 '22 at 01:47
this doesn't happen for me with a sample dataset - the code I wrote works fine for an h5netcdf encoded file. the original issue you asked about 5 months ago was definitely the indentation issue - if you're having a different issue now please feel free to create a full [mre] and ask a new question :) – Michael Delgado Nov 08 '22 at 02:40
I was trying to open netcdf4 files, but you did it for h5netcdf – Naj_m_Om Nov 08 '22 at 02:54
no, you specified `h5netcdf` in your original question. you can always read the xarray docs and adapt the code to read a new file, or ask a new question if you have a different problem. – Michael Delgado Nov 08 '22 at 02:56

Reading xarray goes16 data directly from S3 without downloading into the system

1 Answers1

Linked