6

Using R's raster package, I have a brick obtained from a file, with the following ncdump header (I show a small example file, the actual file is much bigger):

dimensions:
        lon = 2 ;
        lat = 3 ;
        time = UNLIMITED ; // (125000 currently)
variables:
        float lon(lon) ;
                lon:standard_name = "longitude" ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
        float lat(lat) ;
                lat:standard_name = "latitude" ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
        double time(time) ;
                time:standard_name = "time" ;
                time:long_name = "Time" ;
                time:units = "seconds since 2001-1-1 00:00:00" ;
                time:calendar = "standard" ;
                time:axis = "T" ;
        short por(time, lat, lon) ;
                por:_FillValue = 0s ;
                por:missing_value = 0s ;

And in R:

class       : RasterBrick 
dimensions  : 3, 2, 6, 125000  (nrow, ncol, ncell, nlayers)
resolution  : 0.008999825, 0.009000778  (x, y)
extent      : 6.4955, 6.5135, 44.0955, 44.1225  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
data source : /home/clima-archive/afantini/chym/chym_output/test.nc 
names       : X0, X3600, X7200, X10800, X14400, X18000, X21600, X25200, X28800, X32400, X36000, X39600, X43200, X46800, X50400, ... 
z-value     : 0, 449996400 (min, max)
varname     : por

However, for faster access and higher compression two of the file dimensions have been swapped, so that the chunking is better for the kind of use that we need. So the file will be like this (link to download the 1MB file):

dimensions:
        lon = UNLIMITED ; // (2 currently)
        lat = 3 ;
        time = 125000 ;
variables:                                                                                                                                                                                                                            
        float lon(lon) ;                                                                                                                                                                                                              
                lon:standard_name = "longitude" ;                                                                                                                                                                                     
                lon:long_name = "longitude" ;                                                                                                                                                                                         
                lon:units = "degrees_east" ;                                                                                                                                                                                          
                lon:axis = "X" ;                                                                                                                                                                                                      
        float lat(lat) ;                                                                                                                                                                                                              
                lat:standard_name = "latitude" ;                                                                                                                                                                                      
                lat:long_name = "latitude" ;                                                                                                                                                                                          
                lat:units = "degrees_north" ;                                                                                                                                                                                         
                lat:axis = "Y" ;                                                                                                                                                                                                      
        double time(time) ;                                                                                                                                                                                                           
                time:standard_name = "time" ;                                                                                                                                                                                         
                time:long_name = "Time" ;                                                                                                                                                                                             
                time:units = "seconds since 2001-1-1 00:00:00" ;                                                                                                                                                                      
                time:calendar = "standard" ;                                                                                                                                                                                          
                time:axis = "T" ;                                                                                                                                                                                                     
        short por(lon, lat, time) ;
                por:_FillValue = 0s ;
                por:missing_value = 0s ;

And in R:

class       : RasterBrick 
dimensions  : 3, 125000, 375000, 2  (nrow, ncol, ncell, nlayers)
resolution  : 3600, 0.009000778  (x, y)
extent      : -1800, 449998200, 44.0955, 44.1225  (xmin, xmax, ymin, ymax)
coord. ref. : NA 
data source : /home/clima-archive/afantini/chym/chym_output/test_swapped.nc 
names       : X6.5, X6.50899982452393 
degrees_east: 6.5, 6.50899982452393 
varname     : por

As you can see the file is opened as if the number of columns is 125000. I want to swap the number of columns with the number of layers, without reading all the data. I think from the raster manual I should use layer or lvar, since:

layer: integer. The layer (variable) to use in a multi-layer file, or the layer to extract from a RasterStack/Brick or SpatialPixelsDataFrame or SpatialGridDataFrame. An empty RasterLayer (no associated values) is returned if ‘layer=0’

.......

‘lvar’: integer > 0 (default=3). To select the 'level variable' (3rd dimension variable) to use, if the file has 4 dimensions (e.g. depth instead of time)

But this does not seem to work, setting e.g. layer="time", as it changes nothing.

How can I do this?

AF7
  • 3,160
  • 28
  • 63

1 Answers1

3

If you don't mind reshaping AFTER opening/reading, I think you could just read the data in a variable using ncdf4library and then transpose it. Something like:

nc   <- nc_open(*your_nc_file*)
data <- ncvar_get(nc, por)     # "por" is the name of your variable, right ? 
data_new <- aperm(data, c(1,3,2)) # "transpose" the matrix

a possible problem could be that data_new is no longer a raster* object, but you can recreate one from it easily enough.

HTH,

Lorenzo

lbusett
  • 5,801
  • 2
  • 24
  • 47
  • If I'm not mistaken this would mean reading the whole variable, which is not doable as it is HUGE. I actually only need to have the raster defined correctly (because I do some math on resolution, cell numbers etc.) and read only a small number of 125000-long vectors. – AF7 Oct 01 '16 at 17:21
  • Yes, that would require reading the whole file. and transposing indexes. The problem seems to be that you have a BIL (Band Interleaved by Line) file, but raster opens it as a BSQ (Band Sequential). I'll see if something else come to mind... Would it be ok for you to create a "temporary" accessory raster file ? – lbusett Oct 02 '16 at 07:57
  • Also, could you share a subset of the original netcdf file ? – lbusett Oct 02 '16 at 08:11
  • I surely will when I get back to the office, hopefully tomorrow. Thanks in the meantime. – AF7 Oct 03 '16 at 08:02
  • 1
    Ok. I have an idea, but better to test it on a real file rather than on a mock-up. – lbusett Oct 03 '16 at 17:44
  • Sorry for the delay but I was ill. I edited the question, rephrased it and added a link to an example file. Thanks for taking your time to take a look at this. – AF7 Oct 04 '16 at 12:42
  • So, if Iunderstand well, the original file was "normal" and the "dimensions swap" was done by you, right ? Also, in the file you linked are there any data, or is it all "NA" ? – lbusett Oct 04 '16 at 13:49
  • Yes, the swapping is done by me. It probably is all NA (I did not check), but it does not matter. All I want is that when I open the swapped file, `R` understands (maybe using a flag or option) that it is swapped and that it is the right "shape". – AF7 Oct 04 '16 at 13:53
  • 1
    ok. So I suspect that you wouldn't like to "resave" the data to a different file, even if reading that file would give you the desired result, otherwise you wouldn't have swapped it, right ? Am I right in assuming that you are "swapping" dimensions to improve "spectral processing " speed ? If that's the case, why not going all the way and save in BIP format ? [link] http://idlcoyote.com/ip_tips/where3.html – lbusett Oct 04 '16 at 14:18
  • PS: Having an image with some data would help, since I could check that I am reconstructing the dimensions in the "right" way. – lbusett Oct 04 '16 at 14:19
  • Yes, the reason why this swapping was applied is that we needed more speed in reading the file in the "Z" (time) direction. Chunking with `nccopy` or similar is another option which we are exploring, but for some reason it does not work right now on our machines, thus the dimension swapping is the best workaround right now. Using NetCDF is a requirement, we cannot use another format. – AF7 Oct 04 '16 at 14:32
  • Getting tricky... ;-) Working on the assumption that you don't need the raster for visualization/plotting purposes in IDL (as you wrote in one of your comments): can you manage to do the trasposition so that "time" is used as the first dimension ? that would allow very easy and fast access to spectral data. – lbusett Oct 04 '16 at 15:01
  • Yes, in this case we do not need plotting (and fortunately we do not use IDL at all!) but just data analysis. I could transpose the time dimension to the first position instead than to the second, would this change a lot? I still would not be able to read the file using `raster` and having the correct dimensions set. – AF7 Oct 04 '16 at 15:04
  • It would change because you could access desired data very quickly. **I'd have to test it**, but I think you could access data for a given pixel x,y simply by: ncvar_get(nc, start = c(1,x,y), count = c(-1,1,1). So your only problem would be to know the x-y coordinates of the pixel(s) you want. If you only know lat/lon, getting to x-y wpuld be very easy, since you're working on a regular grid and you can find corner coordinates and resolution of the original file by "gdalinfoing" it. – lbusett Oct 04 '16 at 15:13
  • oh, sorry: I meant "for visualization/plotting purposes in R" ;-) – lbusett Oct 04 '16 at 15:37
  • 1
    By the way: I assume you already saw tthese related posts: http://stackoverflow.com/questions/19936432/faster-reading-of-time-series-from-netcdf and http://stackoverflow.com/questions/22944707/how-can-i-specify-dimension-order-when-using-ncdf4ncvar-get , right ? – lbusett Oct 05 '16 at 07:38
  • Thanks Lorenzo, that's an excellent link indeed. The reason why I'm not chunking but changing variable order is due to the fact that I'm getting coutless random error with nccopy even though the syntax is perfectly fine. In fact, right now I rerun the same exact commands I ran yesterday, on the same files, with the same machine and now it crashes... – AF7 Oct 05 '16 at 07:55
  • I admit I'm getting rather confused by the netcdf/r inconsistencies in treating dimensions order... ;-) My current suggestion would be to: 1) creating a "manageable" small / medium subset of your dataset in "original" order; 2) run nco to swap dimensions to get either time as first and second dimension; 3) run some tests to see changes in speed in accessing a time-slice on a selected number of pixels using ncvar_get. If you are really only interested in getting some time-slices from the data, I think this would be better than trying to make "raster" understand the correct format. – lbusett Oct 05 '16 at 08:08
  • by the way: what heck of a dataset has 125000 layers ? ;-) – lbusett Oct 05 '16 at 08:09
  • Long hourly dataset used for hydrology ;-). We will have much longer datasets in the future for sure, so we need speed. That being said, the question focuses on using `raster` due to the convenience given by that package. I'm not saying I could not do without, but our scripts and programs all already use `raster` and I'd like to be able to easily adapt them to read these "swapped" data for much faster access (in one case I manually and tediously did that, it reduced compute time from 15 minutes to 4 seconds!). Chunking would be ideal, again, but it does not seem to be an option. – AF7 Oct 05 '16 at 08:16
  • mmm... then as far as I can understand you're out of luck due to "limitations" in the `ncdf4` library used by raster to access the ncdf files... the only way I found to get `raster` to understand the interleave is re-saving to a plain binary (ENVI or r raster) and then modifying the text header to set the correct band order... – lbusett Oct 05 '16 at 08:54
  • but then the `raster` option `layers` and `lvar` should be right for this, or not? According to the manual they look like they should work, but they do nothing. Is this a bug or am I missing something? – AF7 Oct 05 '16 at 09:29
  • I also tried to use that but with no luck. I may be mistaken, but I now think that they are used in the case of 4 dimensional ncdf datasets (x, y, time and more than one "variable"), to specify wheter time is on the 3rd or on the 4th dimension. To be sure, maybe you could try to get in touch with the package maintainers. – lbusett Oct 05 '16 at 10:35
  • 1
    Ok, sorry for not being able to help you. Keep me posted if you manage to solve this: now I got curious ! ciao, Lorenzo – lbusett Oct 06 '16 at 08:32