1

I have a similar question to u/Ananas here: Sentinel3 OLCI (chl) Average of netcdf files on Python I am running into similar problems, in so much that I cannot seem to extract the necessary information from the .nc-files and then merge them to create a time-series. In my case,I am trying to do this in R. My current code, which I have followed and customised from here: https://www.youtube.com/watch?v=jWRszWCVWLc&t=1504s , returns an error:

Error in `[<-.data.frame`(`*tmp*`, variable, value = c(0, 0, 0, 0, 0,  : 
  replacement has 1927 rows, data has 2202561

Maybe I am going at it the wrong way from the start and R-s capabilities wiht .nc files are not suited for this? Any suggestions are welcomed.

Here is my code

extract_variable_from_netcdf<- function(nc,variable){
  tryCatch(
    {
      result<-var.get.nc(nc,variable)
      return(result)
    },
    error=function(cond){
      message(paste(variable,"attribute not found"))
      message("Here is the original error message")
      message(cond)
    }
  )
}
extract_global_attribute_from_netcdf<- function(nc,global_attribute){
  tryCatch(
    {
      result<-att.get.nc(nc,"NC_GLOBAL",global_attribute)
      return(result)
    },
    error=function(cond){
      message(paste(global_attribute,"attribute not found"))
      message("Here is the original error message")
      message(cond)
    }
  )
}


folder<- "path to folder"
files<- list.files(folder, pattern= ".nc", full.names = TRUE)

variables<- c("conc_chl", "iop_bpart","lat", "lon") #variables I need to extract
global_attrs<- c("start_date", "stop_date")
headers<-c(global_attrs,variables)

df<-data.frame(matrix(ncol=length(headers), nrow=0))
colnames(df)<- headers
for(file in files) {
  nc<- open.nc(file)
  chl<- var.get.nc(nc, "conc_chl")
  num_chl<- length(chl)
  newdf<- data.frame(matrix(ncol=length(headers), nrow=num_chl))
  colnames(newdf)<- headers
 for (global_attribute in global_attrs) {
   newdf[global_attribute]<-extract_global_attribute_from_netcdf(nc,global_attribute)
 }
  
 for (variable in variables) {
  newdf[variable]<-extract_variable_from_netcdf(nc,variable)
}  

  df<-merge(df,newdf,all=TRUE)
}
Ian
  • 13
  • 2

1 Answers1

0

The way I have used ".nc" files with satellite data, in R. Have been reading it in with the "raster" library as a raster file.

library(raster)

r <- raster("yuor_file.nc")
plot(r) # quick plot to see if everything is as it should be

The way I read in my timeseries was with a loop, and in addition I used a function found from this site somewhere, to covert the raster into a sensible r-data frame

stack overflow function, to convert the loaded raster to data frame

gplot_data <- function(x, maxpixels = 50000)  {
  x <- raster::sampleRegular(x, maxpixels, asRaster = TRUE)
  coords <- raster::xyFromCell(x, seq_len(raster::ncell(x)))
  ## Extract values
  dat <- utils::stack(as.data.frame(raster::getValues(x))) 
  names(dat) <- c('value', 'variable')
  
  dat <- dplyr::as.tbl(data.frame(coords, dat))
  
  if (!is.null(levels(x))) {
    dat <- dplyr::left_join(dat, levels(x)[[1]], 
                            by = c("value" = "ID"))
  }
  dat
}

Read in one file at a time, convert with function and return data.frame

files<- list.files(folder, pattern= ".nc", full.names = TRUE)

fun <- function(i) {
  #read in one file at a time
  r <- raster(files[i])
  
  #convert to normal data frame
  temp <- gplot_data(r)
  temp #output 
}
dat <- plyr::rbind.fill(lapply(1:length(files), fun)) #bind each iteration

Here a plot using ggplot2 and ggforce.

ggplot() +
  geom_tile(data = dat,
            aes(x = x, y = y, fill = value))

Alternatively if you do not know the context of you file, the following, from the "ncdf4" package, will help you inspect it. https://towardsdatascience.com/how-to-crack-open-netcdf-files-in-r-and-extract-data-as-time-series-24107b70dcd

library(ncdf4)
our_nc_data <- nc_open("/your_file.nc")

print(our_nc_data)

# look for the variable names and assign them to vectors that can be bound together in dataframes
lat <- ncvar_get(our_nc_data, "lat") #names of latitude column
lon <- ncvar_get(our_nc_data, "lon") #name of longitude column

time <- ncvar_get(our_nc_data, "time") #the time was called time
tunits <- ncatt_get(our_nc_data, "time", "units")# check units

lswt_array <- ncvar_get(our_nc_data, "analysed_sst") #select the relevant variable, this is temperature named "analysed_sst"
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Jost
  • 366
  • 1
  • 7
  • Thanks for your answer. I have got your script to work, but I have a few questions about the output as I am relatively new to working with spatial data in R. The outputs of the code you provided return four columns x, y, value and variable. What do they represent? As far as I understand x and y are supposed to be coordinates, but even then the values in the cells do not appear to be coordinates... And in the "variable" column it just displays "raster::getValues(x)". Am I supposed to change something in the script to fit my needs? Thanks in advance! – Ian Dec 18 '22 at 10:31
  • For good measure, here is the link to the original stack page, where i got the function, https://stackoverflow.com/questions/47116217/overlay-raster-layer-on-map-in-ggplot2-in-r. which gives an awesome walk trough. But it shouldn't be necessary to change anything in the function, since the names are standard assigned when reading in with the raster package. The x and y, should indeed be coordinates, and the "value" column should contain the z-axis information e.g. temperature. I have made an edit to show how to make a simple plot, just to check whether the file is being red in correctly, – Jost Dec 19 '22 at 09:25
  • Thanks for clarifying. However, if my nc-files contain lots of variables, i.e. chlorophyll, TSM and so on, how would I read in only the necessary variables? I guess I'm confused by what the z-axis in my files is and I am not sure how to check it. – Ian Dec 20 '22 at 10:50
  • Ah, okay. Well in that case you need to start by inspecting your files manually and find out what the different variables are named. and then build you data-frame from there, like from this tutorial "https://towardsdatascience.com/how-to-crack-open-netcdf-files-in-r-and-extract-data-as-time-series-24107b70dcd". once you get one file to work you can try to automate it. I have made another edit with the essentials from the tutorial. But I think that this question, should be marked as answered, and that you could get more help if the question is rephrases and reposted :) – Jost Dec 21 '22 at 07:48