3

I'm trying to calculate the SPI from CHIRPS monthly mean precipitation data, because it's too large I cut it down to my area of interest and here it is: https://www.dropbox.com/s/jpwcg8j5bdc5gq6/chirps_mensual_v1.nc?dl=0 I did this to open it:

require(utils)
require(colorRamps)
require(RNetCDF)
require(rasterVis)
require(rgdal)
library(ncdf4)
library(raster)


datos2 <- nc_open("Datos/chirps_mensual_v1.nc")
ppt_array <- ncvar_get(datos2, "precip")

#I'm only taking complete years so I took out two months from 2018

ppt_mes <- ppt_array[ , ,1:444]

I know there is a SPI library but I don't know how should I format the data in order to use it. So I tried to do it without the function by fitting the gamma distribution but I dont' know how to do it for this data base.

Does anyone know how to calculate SPI either with the function or by fitting the distribution?

Ann M
  • 239
  • 2
  • 14

2 Answers2

4

I don't think the SPI package is doing what you (or anyone) thinks it is doing. If you use debug(spi) and step through the code, you'll see that in one step it fits a empirical cumulative distribution function (with ecdf()) to the first two and last rows of data. Why the first two and last rows? I have no clue, but whoever wrote this package also used a for loop to do t() to a matrix. Not to mention that I think it should use a Gamma distribution or Pearson III distribution not ecdf() (according to Guttman, N.B. (1999) Accepting the standardized precipitation index: a calculation algorithm. JAWRA Journal of the American Water Resources Association, 35, 311–322.).

Eric Scott
  • 96
  • 5
  • To be honest I've moved from R and started using Python, but I didn't knew this, I tought when I declare the "distribution" option in the R function it fitted the data to that. I think you are making an interesting point since I think a lot of people use this package. – Ann M Jun 29 '20 at 16:28
  • `SPEI::spi()` lets you specify the distribution, and seems to be doing the correct thing. The `spi` package is the one that uses `ecdf()` and there doesn't seem to be a way to specify another distribution. – Eric Scott Jun 30 '20 at 18:46
  • @EricScott there's good reasons to use a non-parametric approach such as the ecdf (alternative would be a kde: kernel density estimation), you get the exact distribution of your data without assuming that it follows a specific distribution function. If you use a distribution function, you should also check it's feasibility (which is rarely done). – Felix Phl Jul 21 '20 at 10:19
  • 1
    @FelPhl Sure, a non-parametric approach might be good, but the `spi` package claims to implement a published algorithm that uses a Gamma distribution. It's also only the most minor of several problems. The example dataset in the package only includes January–July and calculating SPI requires **continuous** monthly data. Also, as I mentioned before, the function only uses the first two and last rows of data for a calculation in one step. This is almost definitely a mistake as these three rows are totally arbitrary. I would steer clear of this package and use `SPEI` instead. – Eric Scott Jul 22 '20 at 15:39
1

At the end I made it by using the SPI library, the result will be a value for each month in each grid point, if you want to calculate the value over a specific area I made that too but I can share it if you want it too:

Also, this one I made it using CRU data but you can adjust it:

#spei cru 1x1
rm(list=ls(all=TRUE)); dev.off()

require(utils)
require(RNetCDF)
require(rasterVis)
require(rgdal)
library(ncdf4)
require(SPEI)

########################################################################################################


prec <- open.nc("pre_mensual.nc")

lon <- length(var.get.nc(prec, "lon"))
lat <- length(var.get.nc(prec, "lat"))
lon1 <- var.get.nc(prec, "lon")
lat1 <- var.get.nc(prec, "lat")
ppt  <- var.get.nc(prec, "pre") 
ppt  <- ppt[ , ,109:564] #31 18 456 (1980-2017)
anio = 456/12

###########################################################################################################
#Reshape data 

precip <- sapply(1:dim(ppt)[3], function(x)t(ppt[,,x]))

############################################################################################################
#This is for SPI-6, you can use either of them

spi_6 <- array(list(),(lon*lat))

for (i in 1:(lon*lat)) {
  spi_6[[i]] <- spi(precip[i,], scale=6, na.rm=TRUE)
}
#############################################################################################################
#Go back to an array form

sapply(spi_6, '[[',2 )->matriz_ppt 
ppt_6 <- array(aperm(matriz_ppt, c(2,1),c(37,63,456)));spi_c <- array(t(ppt_6), dim=c(37,63,456))
#############################################################################################################
    #Save to netcdf

for(i in 1:456) { 
  nam <- paste("SPI", i, sep = "")
  assign(nam,raster((spi_c[ , ,i]), xmn=min(lon1), xmx=max(lon1), ymn=min(lat1), ymx=max(lat1), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0")) )
}

gpcc_spi <- stack(mget(paste0("SPI", 1:456)))

outfile <- "spi6_cru_1980_2017.nc"
crs(gpcc_spi) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" 
writeRaster(gpcc_spi, outfile, overwrite=TRUE, format="CDF", varname="SPEI", varunit="units",longname="SPEI CRU", xname="lon", yname="lat")

It's not the most stylish way to calculate it but it does work. :)

EDIT: If you want to calculate the SPI/SPEI over an area this is what I did:

library(SPEI)
library(ncdf4)
library(raster)
#

pre_nc <- nc_open("pre_1971_2017_Vts4.nc")
pre <- ncvar_get(pre_nc, "pre")
pre <- pre[, , 109:564] #This is for the time I'm interested in
lats <- ncvar_get(pre_nc, "lat")
lons <- ncvar_get(pre_nc, "lon")
times <- 0:467

# Read mask

#This is a mask you need to create that adjusts to your region of interest
#It consist of a matrix of 0's and 1's, the 1's are placed in the area
#you are interested in

mask1 <- nc_open("cuenca_IV_CDO_05_final.nc")
m1 <- ncvar_get(mask1, "Band1")
m1[m1 == 0] <- NA
#
# Apply mask to data
#
pre1 <- array(NA, dim=dim(pre))

#
for(lon in 1:length(lons)){
  for(lat in 1:length(lats)){
    pre1[lon,lat,] <- pre[lon,lat,]*m1[lon,lat]
  } 
}

#
# Mean over the area of interest
#
mean_pre1 <- apply(pre1,c(3),mean, na.rm=TRUE)

# Calculate SPI/SPEI

spi1 <- matrix(data= NA, nrow = 456, ncol = 48)
for (i in 1:48) {
  spi1[,i] <- spi(data=ts(mean_pre1,freq=12),scale= i)$fitted
}

#This calculates SPI/SPEI-1 to SPI/SPEI-48, you can change it
# Save
#
write.table(spi1,'spi_1980_2017.csv',sep=';',row.names=FALSE)
Ann M
  • 239
  • 2
  • 14
  • 1
    This `pet <- var.get.nc(pet, "pet")` is extra line. The `varname` should be SPI as you are calculating SPI. – UseR10085 Dec 21 '19 at 04:51
  • 1
    `precip <- sapply(1:dim(resta)[3], function(x)t(resta[,,x]))` line is throwing error: `object 'resta' not found`. You have not defined `resta`. – UseR10085 Dec 26 '19 at 11:23
  • if you plot the output coming from your first chunk of code using `plot(gpcc_spi)`, it is coming upside down. I think the code needs further corrections. – UseR10085 Dec 28 '19 at 05:53
  • I don't understand what you mean when you say you plot "the first chunk" The final result shows the map correctly wheter you plot them or open them on something like panoply. If you keep having questions you can e-mail me at: aemm@atmosfera.unam.mx – Ann M Dec 29 '19 at 01:50
  • @AnnM, have you updated this code? – Mukhtar Abdi Aug 04 '23 at 12:23