0

I have a dataset of about 144 entries and 93 variables, where each column correspond to a municipality and the variables account for yearly measurements of environmental data (e.g: temperature, vegetated area, rainfall, etc). As said before, the variables are divided yearly, so I have one column named rainfall_2004, another one for rainfall_2005 and so on. The entire dataset has a timespan of 10 years. Here's a picture to better illustrate:

Just a little piece of my data

I wanted to develop a script where I could create a GLM for each municipality at each year. Luckily, I found Zuur's book, "Mixed Effect Models and Extensions in Ecology with R", which provides such code in one of his examples. I tried adapting it to my dataset, but something went wrong. My knowledge with R is a bit limited, so I'm missing something but I can't quite find it.

Here's Zuur's code:

library(AED); data(RIKZ)

Beta <- vector(length = 9)
for (i in 1:9) {
    Mi <- summary(lm(Richness ∼ NAP, subset = (Beach==i), data=RIKZ))
    Beta[i] <- Mi$coefficients[2, 1]
}

Now here's mine:

count <- dados_ampliados[, 1]
View(count)

for (i in count) {
    RA <- summary(glm(dados_ampliados$infect_2004 ~ dados_ampliados$mmax_2004 +
                                                    dados_ampliados$mmin_2004 +
                                                    dados_ampliados$mprec_2004 +
                                                    dados_ampliados$mumid_2004 +
                                                    dados_ampliados$prop_for_2004 +
                                                    dados_ampliados$prop_urb_2004 +
                                                    dados_ampliados$prod_2004,
                     family = poisson(),
                     subset = (dados_ampliados$Geocode==i),
                     data = dados_ampliados))

    count[i] <- RA$coefficients[2, 1]
}

Yet my code returns:

Error in `[<-.data.frame`(`*tmp*`, i, value = 0.357095537720183) : 
  new columns would leave holes after existing columns

Any ideas as why is this happening? Thanks in advance.

Some observations:

File used in this code can be obtained here. This is a WeTransfer file, so it won't last forever.

In his text, Zuur explains that he's creating that model to analyze data on 9 different beaches. In his code, he compares the value of the 1:9 vector to the beach value, therefore I'm assuming the beaches aren't named, but numbered instead. So, for each value of the vector, he's going to model the corresponding beach. My data however isn't organized like that, but with geocodes provided by the Brazilian Institute of Statistics and Geography, therefore my adaption consisted on creating a vector of 144 entries, one for each row, and each one is populated by the municipalities' geocode. This and the substition of lm for glm were my main adaptations.

For the troubleshooting, I already tried changing the values of RA$coefficients from 2,1 to 1,1 or 1,2. The error remained.

Eric Lino
  • 429
  • 4
  • 10
  • 2
    Don't assign your results to your index vector (`for (i in count) ... count[i] <- stuff`). Make some new object to put them in. – Frank Apr 12 '18 at 18:18
  • This did solve the problem , however I'm getting only one coefficient per variable. The expected output was one coefficient per municipality (144). – Eric Lino Apr 12 '18 at 18:24
  • Hm, that's odd. I can't install AED on my version of R and can't load your data and so can't test. You could maybe try making a simpler/minimal reproducible example so it's easier for others to investigate. Guidance here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Apr 12 '18 at 18:42
  • Thank you for your input, Frank. I'll look into it. Before I do, I just wanted to say that the AED package isn't required to do the operations I listed. It was included in Zuur's code, but as far as I know, it wasn't really used for anything. – Eric Lino Apr 12 '18 at 18:46
  • Frank, I've added a download link for the data on the question! You should be able to use the exact same code I wrote, changing `dados_ampliados` for `subset_example`. – Eric Lino Apr 12 '18 at 19:23
  • I am a bit surprised that a book about mixed effects models provides script for single models for each level. I would rather use a mixed/ multilevel model to model your data. Using the `lme4`package and the `lmer()` function. – FAMG Apr 12 '18 at 19:42
  • You are right, @FAMG. a mixed model would be a better choice and ultimately it will be the model I'll choose, but i'm facing some trouble inputting my data as a time series, that is, informing R that 2004 comes after 2005, etc. I wanted to make those single level models just out of sheer curiosity. Maybe I could compare the accuracy of both methods. Anyway, I'll look into that lme4 package. I didn't know about it. Thank you very much! – Eric Lino Apr 12 '18 at 21:21
  • I assume you have to reearrange your data from wide to long format to use it for multilevel modelling. I suggest to have a look at the `reshape2`package and the `melt()`function. Here is a nice explanation how to do it: [Changing data format](http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/) – FAMG Apr 13 '18 at 08:10

0 Answers0