2

I am looking to verify my understanding of how python objects behave in this example.

Say I have on a laptop with limited memory a very large netcdf4 dataset, for example a million points in the unlimited dimension which is "time" with units of seconds since 2015-11-12 16:0:8.000000 0:00. I want to access, as a datetime object, the very first and the very last time without loading all the values in memory.

Now I know I can get at the first and last dates as datetime objects with this code:

import netCDF4 as nc4
from netCDF4 import Dataset
cdf = Dataset(fname,mode="r",format='NETCDF4')
time_var = cdf.variables['time']
dtime = nc4.num2date(time_var[0:10],time_var.units)
print('data starts at %s' % dtime[0])

The print statement gives me what I want:
"data starts at 2015-11-12 16:00:08"

Now did python load all the 'time' data into memory to do this? Or, as I have come to understand using MATLAB, cdf is now a pointer to the 'time' variable in the open file.

Many thanks, Marinna

Marinna Martini
  • 189
  • 1
  • 13
  • 1
    I don't think that your code will give you the first and last time; `time_var[0:10]` reads the first until (including) the tenth element from the `time` variable. If you want the first and the last element, read `time_var[0]` and `time_var[-1]`. As far as I know, that only reads the first and last element to memory. – Bart Jul 03 '17 at 20:58

1 Answers1

3

Yes, cdf is a pointer or view into the open file, not a copy into memory. This answer discusses this. https://stackoverflow.com/a/4371049/1211981 As @bart mentioned you should just use:

dtime = nc4.num2date(time_var[0],time_var.units)

and

dtime2 = nc4.num2date(time_var[-1],time_var.units)

to get the times you want. No big copy into memory.

Eric Bridger
  • 3,751
  • 1
  • 19
  • 34