3

The full dataset contains ~11,000 rows. I have been running the code with K=400 whilst checking that the code runs.

All of the rows relate to a specific cell on a map and contain information extracted from Sentinel-2 images and a digital elevation map.

A subset of 117 cells also contain habitat covariates recorded on a field trip. As such, some of the columns, including the response variables (S1 and S2) and tussac, are characterised by a high proportion of NAs.

The code:

add_c4 <- "model{
for(i in 1:K) {
S1[i]~dpois(lambda1[i])
lambda1[i]<-exp(a0+a1*DEM_slope[i]+a2*DEM_elevation[i]+a3*tussac[i]+a4*S2[i])

S2[i]~dpois(lambda2[i])
lambda2[i]<-exp(c0+c1*DEM_slope[i]+c2*DEM_elevation[i]+c3*tussac[i]+c4*S1[i])

muLogit_tussac[i]<-b0 + sentinel1[i] + sentinel3[i] + sentinel7[i] + sentinel8[i] + sentinel9[i] + DEM_slope[i]

Logit_tussac[i]~dnorm(muLogit_tussac[i], tau)
logit(tussac[i])<-Logit_tussac[i]
}

# Priors

a0~dnorm(0, 10)
a1~dnorm(0, 10)
a2~dnorm(0, 10)
a3~dnorm(0, 10)
a4~dnorm(0, 10)

b0~dnorm(0, 10)
b1~dnorm(0, 10)
b2~dnorm(0, 10)
b3~dnorm(0, 10)

c0~dnorm(0, 10)
c1~dnorm(0, 10)
c2~dnorm(0, 10)
c3~dnorm(0, 10)
c4~dnorm(0, 10)

tau~dgamma(0.001, 0.001)

#data# S1, S2, K, sentinel1, sentinel3, sentinel7, sentinel8, sentinel9, DEM_slope, DEM_elevation
#inits# a0, a2, a3, a4, b0, b1, b2, b3, c0, c2, c3, c4
#monitor# a0, a1, a2, a3, a4, b0, b1, b2, b3, tau, ped, dic, c0, c1, c2, c3, c4
}"

When I include 'c4*S1[i]' I get the following error:

Possible directed cycle involving some or all of the following nodes

It then proceeds to list all values of S1, S2, lambda1 and lambda2.

Removing 'c4*S1[i]' results in the code running.

I've had a look through the following threads:

Possible directed cycle error in JAGS

https://stats.stackexchange.com/questions/220312/coding-a-jags-error-model-for-a-dependent-variable-that-has-increasing-variance

The issues in them seems to have been caused by the poster using 'y' on both sides of an equation. I think that my issue is caused by the fact that a4 sends the code to the S2 section and c4 sends it back to the S1 section, which is a bit like a directed cycle. Any idea how to get around this?

I've included the top rows of the dataset in case it's of any use:

S1 S2 Logit_tussac moisture DEM_slope DEM_aspect DEM_elevation sentinel1 sentinel2 sentinel3 sentinel4 sentinel5 sentinel6 sentinel7 sentinel8 sentinel9 sentinel10
NA NA     NA            NA  2.434239   168.5011   0.588606366    0.0413    0.0499    0.0531    0.1035    0.1862    0.1968    0.1808    0.1318    0.0400     0.0199
NA NA     NA            NA  3.705001   178.1289   1.007037127    0.0966    0.1108    0.1212    0.0855    0.0917    0.1063    0.0937    0.1842    0.0341     0.0161
NA NA     NA            NA  5.006181   180.0000   1.883010797    0.1309    0.1472    0.1361    0.0855    0.0917    0.1063    0.0937    0.1572    0.0341     0.0161
NA NA     NA            NA  5.006181   180.0000   2.758984468    0.0542    0.0512    0.0472    0.0145    0.0127    0.0092    0.0166    0.0510    0.0148     0.0080

Dataset subset so that only the 117 rows that contain remote and locally sensed data:

S1 S2 Logit_tussac moisture DEM_slope DEM_aspect DEM_elevation sentinel1 sentinel2 sentinel3 sentinel4 sentinel5 sentinel6 sentinel7 sentinel8 sentinel9 sentinel10
NA NA        NA        NA   14.917334   256.1612      12.24432    0.0513    0.0588    0.0541    0.1145    0.1676    0.1988    0.1977    0.1658    0.1566     0.0770
0  0  -9.210240         1   23.803741   225.1231      16.88028    0.1058    0.1370    0.2139    0.2387    0.2654    0.2933    0.3235    0.2928    0.3093     0.1601
NA NA        NA        NA   20.789165   306.0945      18.52480    0.0287    0.0279    0.0271    0.0276    0.0290    0.0321    0.0346    0.0452    0.0475     0.0219
NA NA -9.210240         1    6.689442   287.9641      36.08975    0.0462    0.0679    0.1274    0.1535    0.1797    0.2201    0.2982    0.2545    0.4170     0.2252
0  0  -9.210240         1   25.476444   203.0659      23.59964    0.0758    0.1041    0.1326    0.1571    0.2143    0.2486    0.2939    0.2536    0.3336     0.1937
1  0  -1.385919         3    1.672511   270.0000      39.55215    0.0466    0.0716    0.1227    0.1482    0.2215    0.2715    0.3334    0.2903    0.3577     0.1957
Bong112
  • 161
  • 10

1 Answers1

4

As you have correctly identified, your problem is a directed cycle in the model graph. The reason this is a problem is that it turns out to be quite important that a DAG (directed acyclic graph) does not contain any directed cycles, otherwise there is no guarantee that we can define stable posteriors to sample from.

For example, take the following model containing a directed cycle:

model <- 'model{

    for(i in 1:N){
        a[i] ~ dnorm(b[i], tau)
        b[i] ~ dnorm(a[i], tau)
    }

    tau ~ dgamma(0.01,0.01)

    #monitor# tau
    #data# N

}'

N <- 10
runjags::run.jags(model)

There is no sensible way to estimate this model, and JAGS will tell you so. But it is in theory possible to estimate this model:

model <- 'model{

    for(i in 1:N){
        a[i] ~ dnorm(b[i], tau)
        b[i] ~ dnorm(a[i], tau)
    }

    tau ~ dgamma(0.01,0.01)

    #monitor# tau
    #data# N, a

}'

N <- 10
a <- rnorm(N)
runjags::run.jags(model)

What has changed is that all a[] are now fixed (observed), so we actually can estimate this model. But JAGS still detects the directed cycle, so a workaround is needed:

model <- 'model{

    for(i in 1:N){
        a[i] ~ dnorm(b[i], tau)
        b[i] ~ dnorm(aa[i], tau)
    }

    tau ~ dgamma(0.01,0.01)

    #monitor# tau
    #data# N, a, aa

}'

N <- 10
a <- rnorm(N)
aa <- a
runjags::run.jags(model)

This hides the directed cycle by tricking JAGS into thinking that a[] and aa[] are unrelated. But this only works when all a[] are observed/fixed, otherwise the missing aa[] are not estimated or defined within the model. In your case, S1[] and S2[] seem to be partially observed, so this trick will not work unless you can simply omit the rows/observations with missing S1 or S2 (which may not be feasible as you say they have a high proportion of NA).

Otherwise you will have to re-formulate your model somehow to break the directed cycle. This would involve thinking about the biological process underlying your system and how to represent the relationship you want without creating a directed cycle, so it is not really something we can help with.

Hope that helps,

Matt

Matt Denwood
  • 2,537
  • 1
  • 12
  • 16