0

Apologies if this doesn't make any sense, this is my first experience using R (and coding in general) and I'm lost. Any help would be greatly appreciated.

I have a data set of case numbers from various countries and years for measles.

So far, I have (I'm assuming correctly but am not completely sure):

  • subsetted that data set to only contain info for after 1998 (to 2018), called All_after

  • using a loop (1-194 being the different countries), run a linear model of the effect of year on the natural log during this time period

What I should have is slope for each country from the linear model (which I can then plot as histograms). However, with my current code I am 'slope-less'. I've tried abline functions and lm functions inside and outside the loop but get the answer that whatever I put in the brackets isn't finite.

Here's my code so far:

All_after <- subset(Measles, year >= 1998, year < 2018)

attach(All_after)

All_after$Cname <- "1":"194"

log_All_after <- log(All_after[,7]+1)

i <- c(Cname="1":"194")

for (val in i) {

with(All_after[All_after$Cname==1,] plot(year,log_All_after))

model5 <- lm(log_All_after~year, data=All_after, subset=i)

} 

Again, apologies if this is all gibberish!

Edit:

Using the summary function, this is what appears for All_after :

> summary(All_after)
       X        WHO_REGION     ISO_CODE                    Cname     
 Min.   :3493   AFR : 987   AFG    :  21   Afghanistan        :  21  
 1st Qu.:4511   AMR : 735   AGO    :  21   Albania            :  21  
 Median :5530   EMR : 441   ALB    :  21   Algeria            :  21  
 Mean   :5530   EUR :1113   AND    :  21   Andorra            :  21  
 3rd Qu.:6548   SEAR: 231   ARE    :  21   Angola             :  21  
 Max.   :7566   WPR : 567   ARG    :  21   Antigua and Barbuda:  21  
                            (Other):3948   (Other)            :3948  
    Disease          year          cases         
 measles:4074   Min.   :1998   Min.   :     0.0  
                1st Qu.:2003   1st Qu.:     0.0  
                Median :2008   Median :    20.5  
                Mean   :2008   Mean   :  2402.6  
                3rd Qu.:2013   3rd Qu.:   446.5  
                Max.   :2018   Max.   :217151.0  
                               NA's   :294     
Maddie
  • 1
  • 2
  • 1
    Hi Maddie, it's hard to know what went wrong with your code without knowing what is in All_after. Can you provide an example say with 3 countries? And also what is dim(All_after) ? – StupidWolf Dec 16 '19 at 13:00
  • 2
    Do not use use `attach`. Use `library(nlme); models <- lmList(log_All_after ~ year | Cname, data=All_after); coef(models)`. However, I don't understand your creation of `Cname`. Syntax like `"1":"194"` doesn't make sense. It should probably be `factor(1:194)` but at least that still does assumptions regarding the order of your data and at worst it is wrong. – Roland Dec 16 '19 at 13:12
  • 2
    To improve your chances of getting a useful answer, please read [this](https://stackoverflow.com/help/how-to-ask) post. In particular, including a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your question is essential. – Samuel Dec 16 '19 at 13:38
  • @Roland thank-you for your answer. I've tried adding your line of code in in various places as well as changing that bit to factor(1:194) but it just keeps throwing up errors. Was there a specific bit I should add it in? – Maddie Dec 16 '19 at 21:36
  • @StupidWolf thank-you for editing my question so bits appeared in the grey coding box and for your answer. I've added what comes up under summary (All_after) - does that help? – Maddie Dec 16 '19 at 21:37
  • Hi @Maddie, see below, does it work for you? – StupidWolf Dec 17 '19 at 08:26

1 Answers1

0

what I have written below can get you started. I am roughly guessing what you data is like, and simulate some values for 3 countries:

n = 21 

All_after <- data.frame(
  X = sample(3000:4000,n*3),
  WHO_REGION = rep(c("EMR","EUR","EUR"),each=n),
  ISO_CODE = rep(c("AFG","AND","ALB"),each=n),
  Cname = rep(c("Afghanistan","Andorra","Albania"),each=n),
  Disease="measles",
  year = sample(1998:2018,n*3,replace=TRUE),
  cases = rnbinom(n*3,mu=400,size=0.1)
)

We do log as you have on the 7th column. I add it to the data.frame

All_after$log_All_after <- log(All_after[,7]+1)

Below is the part I made changes, you need to create a unique vector of country names to iterate through and also a list to store your results:

# create a vector of all your countries to iterate through
all_countries <- unique(All_after$Cname)

# to store the output
output <- vector("list",length(all_countries))
names(output) <- all_countries

Here i plot 3 per page using par(mfrow=c(1,3)), you can remove this if you prefer one plot per page. The key is to store your plots in a pdf, so it will not be lost.

pdf("regress.pdf",width=12,height=8)
par(mfrow=c(1,3))

for (val in all_countries) {

# you need to subset using Cname == val
countryData <- All_after[All_after$Cname==val,]
with(countryData, plot(year,log_All_after,main=val))

model5 <- lm(log_All_after~year, data=countryData)
abline(model5,lty=8,col="blue")
output[[val]] <- coefficients(model5)

} 
dev.off()

enter image description here

And you can collect all your results

 do.call(rbind,output)
            (Intercept)        year
Afghanistan  -202.79141  0.10199578
Andorra       -41.21576  0.02161816
Albania       115.88641 -0.05598124
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank-you very much for the time and effort you've put into trying to answer my question. It's very much appreciated. I'm afraid the code doesn't work for what I'm trying to accomplish. Your graphs look great so it's quite likely I'm being dense about how to use it in a loop for all 194 countries. I asked a friend who is a programmer (though not in R) to have a look at this question and they were stumped even after spending 4 hours on it. They put in a nested loop for the countries but couldn't get into the linear model bit, so I am still slope-less and histogram-less. – Maddie Dec 17 '19 at 23:46
  • If I am asked to do C or java, i would be stumped for 4 hours too. I think it's about asking the right question and explaining the problem to others. Maybe have a good at editing your question :) – StupidWolf Dec 17 '19 at 23:55