1

I have three variables in my dataset, as State, Year and Serotype. The code I have below is to aggregate the line listed data. I have created empty data frames to store results of for loop for Agg.Res 1 2..and so on. My problem is How do I go about making empty data frames to store results for different years? I want to make calculations for each year. When I run this code it just does the calculation for 2013 because I haven't created empty data frame, for data7, to store results for each year. Any help would be much appreciated.

Agg.Res <- data.frame(matrix(NA, nrow=11, ncol=17)) 
for( i in 1:50 ){ # I am creating 50 sequentially numbered data frames
dataframe.name <- paste( "Agg.Res",i, sep="") # Names the matrix
assign( dataframe.name, Agg.Res, envir = .GlobalEnv) # Assigns template dataframe to name
}

#For State Illinois
data6<-data3[which(data3$State=="Illinois"),]

for(i in 2003:2013){  # loop for different years
data7<-data6[which(data6$YEAR==i),]

Ent1<-data7[which(data7$SEROTYPE_GR=="A"),]
Agg.Res1[i-2002,]<-colSums(Ent1[,31:47], na.rm=T)/nrow(Ent1)

Ent2<-data7[which(data7$SEROTYPE_GR=="B"),]
Agg.Res2[i-2002,]<-colSums(Ent2[,31:47], na.rm=T)/nrow(Ent2)

Ent3<-data7[which(data7$SEROTYPE_GR=="C"),]
Agg.Res3[i-2002,]<-colSums(Ent3[,31:47], na.rm=T)/nrow(Ent3)

Ent4<-data7[which(data7$SEROTYPE_GR=="D"),]
Agg.Res4[i-2002,]<-colSums(Ent4[,31:47], na.rm=T)/nrow(Ent4)

Ent5<-data7[which(data7$SEROTYPE_GR=="E"),]
Agg.Res5[i-2002,]<-colSums(Ent5[,31:47], na.rm=T)/nrow(Ent5)
}

The data looks like this:

State      Year          Serotype    Drug A    Drug B     Drug C . . . . 

Illinois   2003            A          1          0          1    . .. .  

Illinois   2003            B          0          0          1    . . . . 

 .          .              .           .          .          .    . . . 
 .          .              .           .          .          .    . . .
Missouri   2008            E           1          1          1  . . . . 

The year ranges from 2003:2013; Serotype ranges from A:E; also includes various states. If a serotype is resistant to a drug its given by 1, if its not resistant then its 0; Binary variables.

Tivos Jar
  • 27
  • 1
  • 7
  • This sounds like something that *might* be solveable using `lapply` *if* you could share some example data that illustrate your problem. Right now, the whole setup seems a bit messy to me... – SimonG Aug 04 '15 at 22:36
  • @SimonG Thanks for the response. I have added sample data set and more explanation to what I'm trying to achieve. I'm basically trying to aggregate a line-listed data. – Tivos Jar Aug 05 '15 at 14:50
  • The data you provided isn't particularly helpful because it doesn't enable other users to run your code. See this question for some great tips on making a question more accessible: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – SimonG Aug 05 '15 at 15:10

1 Answers1

1

It seems like you are doing a lot more work than necessary. I would recommend using data.table:

library(data.table)
# I don't like using indices, but if you don't have column names, they'll have to do
dt_data <- as.data.table(data6[, c(1, 2, 31:47)])

# calculate column means by YEAR and SEROTYPE_GR. Resulting object is a data.table of the results
dt_colSumar <- dt_data[, lapply(.SD, mean), by = c("YEAR", "SEROTYPE_GR") ]

# split into list by SEROTYPE_GR
serotype_list <- split(dt_colSumar, dt_colSumar$SEROTYPE_GR)

# if you REALLY want to assign back to data frames
for (i in 1:5){
  assign(paste0("Agg.Res", i), as.data.frame(serotype_list[[i]]), envir = .GlobalEnv)
}
mlegge
  • 6,763
  • 3
  • 40
  • 67