I'm kind of new in r. A got a big frame called MegaFrame (11 000 000 rows). I want to make another dataset with the mean of data MegaFrame$value, for my different sessions and P_CODE. This gives a lot of NA, because a lot of P_CODE-session pairs don't exist in the frame. I find a (I think working) solution, but now it has been running for 12 hours and still not finished.
colClasses = c("integer", "factor", "integer")
col.names = c("MeanMesure", "P_CODE", "session")
MeanFrame <- data.frame( mean(MegaFrame$value[MegaFrame$session == unique(MegaFrame$session)[i] && MegaFrame$P_CODE == levels(MegaFrame$P_CODE)[i]]),
MegaFrame$P_CODE[i],MegaFrame$session[j])
colnames(MeanFrame) = col.names
MeanFrame<- MeanFrame[-1,]
for(i in 1:length(unique(MegaFrame$session))){
for(j in 1:length(levels(MegaFrame$P_CODE))){
x<-mean(MegaFrame$value[MegaFrame$session == unique(MegaFrame$session)[i] && MegaFrame$P_CODE == levels(MegaFrame$P_CODE)[i]])
df<- data.frame(x,MegaFrame$P_CODE[i],MegaFrame$session[j])
colnames(df) = col.names
MeanFrame<-rbind(MeanFrame, df)
}}
I know I can add a condition so that the NA
values are not added to the dataframe. But I feel my method is too heavy (making every iteration a df, changing his name, then rbind) for what I want to do, but I don't know how to make it softer. I already had a lot of trouble with adding rows to the dataframe.
Has anybody ideas for this?