I have a dataset of about 144 entries and 93 variables, where each column correspond to a municipality and the variables account for yearly measurements of environmental data (e.g: temperature, vegetated area, rainfall, etc). As said before, the variables are divided yearly, so I have one column named rainfall_2004
, another one for rainfall_2005
and so on. The entire dataset has a timespan of 10 years. Here's a picture to better illustrate:
I wanted to develop a script where I could create a GLM for each municipality at each year. Luckily, I found Zuur's book, "Mixed Effect Models and Extensions in Ecology with R", which provides such code in one of his examples. I tried adapting it to my dataset, but something went wrong. My knowledge with R is a bit limited, so I'm missing something but I can't quite find it.
Here's Zuur's code:
library(AED); data(RIKZ)
Beta <- vector(length = 9)
for (i in 1:9) {
Mi <- summary(lm(Richness ∼ NAP, subset = (Beach==i), data=RIKZ))
Beta[i] <- Mi$coefficients[2, 1]
}
Now here's mine:
count <- dados_ampliados[, 1]
View(count)
for (i in count) {
RA <- summary(glm(dados_ampliados$infect_2004 ~ dados_ampliados$mmax_2004 +
dados_ampliados$mmin_2004 +
dados_ampliados$mprec_2004 +
dados_ampliados$mumid_2004 +
dados_ampliados$prop_for_2004 +
dados_ampliados$prop_urb_2004 +
dados_ampliados$prod_2004,
family = poisson(),
subset = (dados_ampliados$Geocode==i),
data = dados_ampliados))
count[i] <- RA$coefficients[2, 1]
}
Yet my code returns:
Error in `[<-.data.frame`(`*tmp*`, i, value = 0.357095537720183) :
new columns would leave holes after existing columns
Any ideas as why is this happening? Thanks in advance.
Some observations:
File used in this code can be obtained here. This is a WeTransfer file, so it won't last forever.
In his text, Zuur explains that he's creating that model to analyze data on 9 different beaches. In his code, he compares the value of the 1:9 vector to the beach value, therefore I'm assuming the beaches aren't named, but numbered instead. So, for each value of the vector, he's going to model the corresponding beach. My data however isn't organized like that, but with geocodes provided by the Brazilian Institute of Statistics and Geography, therefore my adaption consisted on creating a vector of 144 entries, one for each row, and each one is populated by the municipalities' geocode. This and the substition of lm
for glm
were my main adaptations.
For the troubleshooting, I already tried changing the values of RA$coefficients
from 2,1 to 1,1 or 1,2. The error remained.