3

This is my first attempt working with data in a .nc file. I manually downloaded the dataset (https://downloads.psl.noaa.gov/Datasets/noaa.oisst.v2/sst.ltm.1961-1990.nc) and saved it to my computer. However, I'm having difficulty extracting the data for analysis.

# open data file
sst <- nc_open("./Data_Covariates/Raw/sst.ltm.1961-1990.nc")

# view metadata
print(sstOld)

# extract data
lat = ncvar_get(sst, "lat")
lon = ncvar_get(sst, "lon")
date = ncvar_get(sst, "time")
sstVar = ncvar_get(sst, "short sst") 
# from the metadata, I assumed that "short sst" was the name of the variable for sea surface temperature, however, I get the following error:
Error in vobjtovarid4(nc, varid, verbose = verbose, allowdimvar = TRUE) : 
      Variable not found

Additionally, the metadata states that unit for time is days since 1800-01-01 00:00:0.0. However, the values are all negative numbers and I'm unsure how to convert these to actual dates.

tnt
  • 1,149
  • 14
  • 24

2 Answers2

3

The variable name is sst array with 3 dimensions.

sstVar = ncvar_get(sst, "sst") 
dim(sstVar)
# [1] 360 180  12

And for date, we need origin:

as.Date(date, origin = "1800-01-01")
# [1] "0000-12-30" "0001-01-30" "0001-02-27" "0001-03-30" "0001-04-29"
# [6] "0001-05-30" "0001-06-29" "0001-07-30" "0001-08-30" "0001-09-29"
# [11] "0001-10-30" "0001-11-29"
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Thanks @zx8754! The dates seem a bit weird to me though. The data set is supposed to start in 1960, but this date format starts at "0000-12-30". – tnt Dec 05 '22 at 21:19
1

The data starts in 1961, so I think we need to use the ltm_range attribute. Here's a full reprex, including plots:

library(ncdf4)

sst <- nc_open("sst.ltm.1961-1990.nc")
lat = ncvar_get(sst, "lat")
lon = ncvar_get(sst, "lon")
date = cumsum(c(0, diff(ncvar_get(sst, "time")))) + 
        ncatt_get(sst, "time")$ltm_range[1]
date = as.Date(date, origin = "1800-01-01")
sstVar = ncvar_get(sst, "sst") 

df <- data.frame(lat = rep(rep(lat, each = length(lon)), length(date)), 
                 lon = rep(rep(lon, length(lat)), length(date)),
                 date = rep(date, each = length(lat) * length(lon)),
                 sst = as.vector(sstVar))

library(ggplot2)

ggplot(df, aes(lon, lat, fill = sst)) +
  geom_raster() +
  scale_fill_viridis_c(na.value = "gray50") +
  facet_wrap(.~date) +
  theme_minimal()

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks @Allan! Unfortunately, I'm getting the following error when I run the first date line: Error in ncvar_get(sst, "time") : first argument (nc) is not of class ncdf4! – tnt Dec 05 '22 at 21:42
  • 1
    @tnt have you got a typo somewhere? `ncvar_get(sst, "time")` is exactly the same as your own code, and you can see from the plot it definitely works. Have you checked that `sst` exists after running the girst couple of lines? – Allan Cameron Dec 05 '22 at 22:04
  • I did have a typo! But after correcting that, I only have dates for 1961, but the dataset is supposed to run from 1961-1990, so I think there might still be something missing. – tnt Dec 05 '22 at 22:14
  • 1
    I think the issue might be ncatt_get(sstOld, "time")$ltm_range[1] as this simply calls in the minimum value, but doesn't allow for the year to increase after it's cycled through cumsum(c(0, diff(ncvar_get(sstOld, "time")))). therefore, I only end up with 12 dates, instead of 360 dates (12 months * 30 years). – tnt Dec 05 '22 at 23:20
  • 1
    If you look at the object sst, it says it only has 12 dates in it (the dates dimension is length 12, and the sstVar object is an array with only 12 slices). I think there are only the 12 months of 1961 in this particular file (I know its name suggests otherwise, but we can see all the data it contains) – Allan Cameron Dec 05 '22 at 23:29