My typical workflow is to download large datasets (netcdf) and then subset them by a single lat/lon (gridpoint). However, I frequently just need a single gridpoint of a particular variable(s), like air temperature/precipitation, and would like to be able to efficiently subset large datasets, such as CMIP6, prior to downloading so that the download is small. So far I have tried esgf-pyclient, however, to extract a variable for a single gridpoint (for the years 1850 - 2100, ~91,675 days/rows of data) can take upwards of an hour. This slow speed defeats the purpose of subsetting prior to downloading. Internet is not an issue as my download speeds (ethernet) are > 1Gbps. If anyone has any suggestions or alternative workflows it would be appreciated!
Code I am using for esgf-pyclient:
from pyesgf.search import SearchConnection
import xarray as xr
import numpy as np
conn = SearchConnection('https://esgf-data.dkrz.de/esg-search', distrib=True)
ctx = conn.new_context(
product = 'input',
project = 'ISIMIP3b',
# model = 'GFDL-ESM4',
experiment='historical',
variable='tasAdjust', #, tasminAdjust, tasmaxAdjust, prAdjust'
time_frequency='day',
data_node='esg.pik-potsdam.de'
)
ctx.hit_count
result = ctx.search()[0]
result.dataset_id
files = result.file_context().search()
ds = xr.open_dataset(files[0].opendap_url).sel(lat=32.298583, lon=-97.78538710, method="nearest")
The desired output would be a 91,675 row, single column/vector of data for the desired gridpoint (lat/lon).