1

I am working with a spatial dataset where the response is binary 0 or 1. I want to estimate the beta coefficients and account for the spatial autocorrelation, so am trying to fit a binomial regression model using the CARBayes package S.CARleroux with family="binomial". However, sometimes when I do this it works and the result is a distribution for the coefficients, but with other datasets (formatted the same way) the trace plots remain flat and there the effective sample number n.effective is 0. For example, in this case, it appears that the sampling was working, and then essentially failed: enter image description here

Other times, again with very similar datasets, the result is enter image description here

The code I'm using is: car.prox <- S.CARleroux(formula=f, data = df, W=W, family="binomial",burnin=burn.in, n.sample=n.sample,thin=20, trials=rep(1,nrow(df)), rho = 1, verbose=TRUE)

The output, in the second case, is

#################
#### Model fitted
#################
Likelihood model - Binomial (logit link function) 
Random effects model - Leroux CAR
Regression equation - y ~ var2 + var3 + var4 + var5 + var6 + 
    var7 + var8 + var9 + var10
Number of missing observations - 0

############
#### Results
############
Posterior quantities and DIC

                Median       2.5%      97.5% n.sample % accept n.effective
(Intercept)   -12.2061   -12.2061   -12.2061    10000      0.8         0.0
var2           -1.7098    -1.7098    -1.7098    10000      0.8         0.0
var3            6.2169     6.2169     6.2169    10000      0.8         0.0
var4           21.3834    21.3834    21.3834    10000      0.8         0.0
var5           -8.3727    -8.3727    -8.3727    10000      0.8         0.0
var6            7.8046     7.8046     7.8046    10000      0.8         0.0
var7            2.3668     2.3668     2.3668    10000      0.8         0.0
var8           -8.7651    -8.7651    -8.7651    10000      0.8         0.0
var9           -1.0197    -1.0197    -1.0197    10000      0.8         0.0
var10          -3.6726    -3.6726    -3.6726    10000      0.8         0.0
tau2        11022.7758 10122.9703 12068.6439    10000    100.0      4420.2
rho             1.0000     1.0000     1.0000       NA       NA          NA
            Geweke.diag
(Intercept)         NaN
var2                NaN
var3                NaN
var4                NaN
var5                NaN
var6                NaN
var7                NaN
var8                NaN
var9                NaN
var10               NaN
tau2               -2.3
rho                  NA

DIC =  NaN       p.d =  NaN       LMPL =  -2871.75

How can I fix this and get the distribution for the coefficients in a way that accounts for the spatial autocorrelation?

Reproducible example:

# libraries
library(CARBayes)
library(Matrix)
# data as txt can be copied from https://gist.github.com/tommlogan/584f235f60bb05b5ae57e0d7e44b7aee
# there are dput options and .txt options
df <- read.table('atl_data.txt')
W <- readMM(file='atl_neighbors.txt')
W <- as(W, 'matrix')*1
# parameters
burn.in  <- 50000 #100000
n.sample <- 150000 # 300000
thin.n   <- 20
explan_vars = c('black','asian','amindian','hispanic','youth','aged','poverty','density','female')

# Now we fit the model to three different response variables in the same dataset and see the output changes
# this first one at least samples randomly (the trace plots are suitable)
f <- as.formula(paste("class_prox~", paste(explan_vars, collapse=" + ")))
car.lm <- S.CARleroux(formula=f, data = df, W=W, family="binomial",burnin=burn.in, n.sample=n.sample,thin=thin.n, trials=rep(1,nrow(df)), rho = 1, verbose=TRUE)
print(car.lm)
plot(car.lm$samples$beta)
# in this example the trace plots initially indicate variation in the samples, but then it flatlines at a constant value
f <- as.formula(paste("class_crime~", paste(explan_vars, collapse=" + ")))
car.lm <- S.CARleroux(formula=f, data = df, W=W, family="binomial",burnin=burn.in, n.sample=n.sample,thin=thin.n, trials=rep(1,nrow(df)), rho = 1, verbose=TRUE)
print(car.lm)
plot(car.lm$samples$beta)
# in this example the trace plots are a constant value the entire time
f <- as.formula(paste("class_cong~", paste(explan_vars, collapse=" + ")))
car.lm <- S.CARleroux(formula=f, data = df, W=W, family="binomial",burnin=burn.in, n.sample=n.sample,thin=thin.n, trials=rep(1,nrow(df)), rho = 1, verbose=TRUE)
print(car.lm)
plot(car.lm$samples$beta)
Tom Logan
  • 341
  • 3
  • 15
  • 2
    It would help a lot if you'd provide us with the data you used (or similar data that produce the same problems). – RBirkelbach Aug 14 '19 at 05:55
  • Sure thing @rbirkelbach. Here is data and jupyter notebooks of code. You'll see there are two of each, but the main difference is the region studied. I've provided the code to show how the response variable is calculated. For each region, there are three response variables and the CARBayes converges differently for each of those. https://drive.google.com/drive/folders/1X8B-gYZfocm-XIFUEIna6OQjnuJVWLUw?usp=sharing – Tom Logan Aug 14 '19 at 16:27
  • 1
    @TomLogan Sharing data through third-party file hosting services is usually not a good idea as most people (myself included) are loath to download files from unknown sources. This is an interesting question, and you mentioning that the model results in sensible Markov chains for some data but not others may suggest an issue with the priors (e.g. are they flat?). In order to elicit a better response I suggest providing a self-contained & minimal code example including data (see the [advice given this link](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)). – Maurits Evers Aug 19 '19 at 09:16
  • Thanks @maurits-evers Struggled with the dput because the data is large, and this isn't an issue that can be reproduced with a subset as it's based on the neighbors and spatial structure. but it is now in txt files that can be viewed and copied (rather than downloaded) from gisthub. Short and sweet code is in the edited question above – Tom Logan Aug 29 '19 at 00:06

0 Answers0