NcML aggregation of remote THREDDS catalog

Question

I want to aggregate all files within a specific directory of a remote THREDDS catalog. These are grib2 files for nam forecast. This is the main list of directories for each month. Here is my ncml file for the aggregation of this catalog of files:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
    <aggregation dimName="time" type="joinExisting">
    <scan location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/" regExp="^.*\.grb2$" subdirs="false"/>
    <dimension name="time" orgName="t" />
    </aggregation>
</netcdf>

Also, I am mostly interested in having these two variables in the files: u-component_of_wind_height_above_ground and v-component_of_wind_height_above_ground.

I am not sure the above aggregation is correct from the remote catalog. I get this error from the above ncml file:

There are no datasets in the aggregation DatasetCollectionManager{ collectionName='http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/^.*\.grb2$' recheck=null dir=http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/ filter=^.*\.grb2$

How this ncml file should be written?

Thanks.

score 2 · Accepted Answer · answered Jul 28 '18 at 12:55

You cannot glob remote URLs so you will need to provide a list of these OPeNDAP endpoints to the aggregation, like:

<dataset name="Nam218" urlPath="nam218">
  <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <aggregation dimName="time" type="joinExisting">
      <netcdf location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/<file01>.grb2"/>
      <netcdf location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/<file02>.grb2"/>
      <netcdf location="http://www.ncei.noaa.gov/thredds/dodsC/nam218/201807/20180723/<file03>.grb2"/>
    </aggregation>
  </netcdf>
</dataset>

score 0 · Answer 2 · answered Aug 15 '18 at 14:09

0

You can write a simple program (I used c++) for use in the command prompt. (I use Windows.) It launches a BAT file that launches wget and downloads the latest THREDDS catalog, then saves it in plain text, then the c++ program loads the entire file into a string where I parse it and do what I want with the data.

answered Aug 15 '18 at 14:09

David

605
2
14
44

Thanks for your response, but that's not the question. If I wanted to download, then aggregate, I could simply use `netCDF4.MFDataset` (one line does all you mentioned). The question is to use NCL, allowing to customize data on server side, _before_ downloading terabytes. – Sia Aug 16 '18 at 19:25

NcML aggregation of remote THREDDS catalog

2 Answers2