3

I have a somewhat complex directory structure for NetCDF files I want to create a THREDDS catalog for.

/data/buoy/A0121/realtime/A0121.met.realtime.nc
                         /A0121.waves.realtime.nc
                         etc.
/data/buoy/A0122/realtime/A0122.met.realtime.nc
                         /A0122.sbe37.realtime.nc
                         etc.
/data/buoy/B0122/realtime/B0122.met.realtime.nc
                         /B0122.sbe37.realtime.nc
etc.

But I have found that the regExp attribute in both datasetScan and aggregation/scan elements does not seem to be able to handle subdirectories using regExp. For example this catalog entry works.

<datasetScan name="All TEST REALTIME" ID="all_test_realtime" path="/All/Realtime"
   location="/data/buoy/B0122" >
  <metadata inherited="true">
    <serviceName>all</serviceName>
  </metadata>
  <filter>
    <include regExp="realtime" atomic="false" collection="true" />
    <include wildcard="*.nc" />
    <!-- exclude directory -->
    <exclude wildcard="old" atomic="false" collection="true" />
  </filter>
</datasetScan>

But the following does not. No datasets are found.

<datasetScan name="All TEST REALTIME" ID="all_test_realtime" path="/All/Realtime" 
  location="/data/buoy" >
  <metadata inherited="true">
    <serviceName>all</serviceName>
  </metadata>
  <filter>
    <include regExp="B0122/realtime" atomic="false" collection="true" />
    <include wildcard="*.nc" />
    <!-- exclude directory -->
    <exclude wildcard="old" atomic="false" collection="true" />
  </filter>
</datasetScan>

This is a greatly simplified example done just to confirm that regExp does not match subdirectories which is implied at the bottom of this ncML page. http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/ncml/v2.2/AnnotatedSchema4.html

My real goal is to use ncML aggregation via <scan regExp="">

Should I be using FeatureCollections? These are pretty simple time series buoy observation files.

Rich Signell
  • 14,842
  • 4
  • 49
  • 77
Eric Bridger
  • 3,751
  • 1
  • 19
  • 34

2 Answers2

3

If you are scanning files for an <aggregation> and you want to include subdirectories, you can add subdirs="true" inside the <scan> element, for example:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <aggregation dimName="ocean_time" type="joinExisting">
        <scan location="." regExp=".*vs_his_[0-9]{4}\.nc$" subdirs="true"/>        
    </aggregation>
</netcdf>

For datasetScan datasets, the regexp filter will automatically apply to all subdirectories, so if you wanted to apply those filters to all subdirectories, you could just do:

<datasetScan name="All TEST REALTIME" ID="all_test_realtime" path="/All/Realtime" 
  location="/data/buoy" >
  <metadata inherited="true">
    <serviceName>all</serviceName>
  </metadata>
  <filter>
    <include regExp="realtime" atomic="false" collection="true" />
    <include wildcard="*.nc" />
    <!-- exclude directory -->
    <exclude wildcard="old" atomic="false" collection="true" />
  </filter>
</datasetScan>
Rich Signell
  • 14,842
  • 4
  • 49
  • 77
  • I realize that subdirs are searched. My question is really about why regExp="realtime" works and regExp="B0122/realtime" does not. I found this to be the case in both datasetScan and scan regExp attributes. I had been hoping to leverage the existing directory structure to limit the number of subdirs searched. Luckily in my case the file names contain all the information needed to find what I'm after. – Eric Bridger Oct 17 '13 at 15:52
  • Did you try `regExp=".*B0112/realtime"`? If that works, that would indicate that `regExp` is working on full path names. But it seems more likely that `regExp` is working on local directory or file names as it walks the path, thus never finding matches for `B0122/realtime` – Rich Signell Oct 18 '13 at 17:31
  • I did try regExp=".*B0122/realtime" with no luck. I've raised this with the Unidata developers via the list and I think you are correct. Which makes sense programatically. Walk the path and check the file names found vs the regExp. I did have success with using two filter/includes. See example below. – Eric Bridger Oct 21 '13 at 19:24
3
<filter>
  <include regExp="[A-Z]{1}[0-9]{4}" atomic="false" collection="true" />
  <include wildcard="realtime" atomic="false" collection="true" />
  <include wildcard="post-recovery" atomic="false" collection="true" />
  <include wildcard="*.nc" />
  <!-- exclude directory -->
  <exclude wildcard="old" atomic="false" collection="true" />
</filter>
Eric Bridger
  • 3,751
  • 1
  • 19
  • 34