I had this question in a test in an online training. I got it right just by trial and error. Most likely, I am doing something wrong because I am a beginner in R programming.
This is the question:
Before you read my R code, take note that for the last part I had to convert all the columns to numeric, because without that I was getting the following message:
"Error in rowMeans(DT) : 'x' must be numeric."
In the test, my professor solution is: "DT[,mean(pwgtp15), by=SEX]"
With my R code, the right answer is mean(DT$pwgtp15, by=DT$SEX).
I get this output:
My doubt is that maybe my way to make work DT[,mean(pwgtp15), by=SEX] produces a slow computation.
For that, I used
DT <- data.frame(data.matrix(DT))
Which one is the right answer? The professor solution? My answer? Another one?
Here is my code:
#THE SOLUTION IS DT[,mean(pwgtp15), by=SEX]
#HOWEVER, my solution is mean(DT$pwgtp15, by=DT$SEX)
install.packages("data.table")
library("data.table")
# the example below runs 100 times
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv", destfile = "ACS.csv")
DT <- fread("ACS.csv", sep = ",")
counter<- 0
myName<-"DT[,mean(pwgtp15), by=SEX]"
for (i in 1:100)
{
a<- Sys.time()
DT[,mean(pwgtp15), by=SEX]
b<-Sys.time()
myTime<-b-a
counter<- counter + myTime
}
cat("counter is: ", counter, "myName is: ", myName, "\n")
counter<- 0
myName<-"mean(DT[DT$SEX==1,]$pwgtp15);mean(DT[DT$SEX==2,]$pwgtp15)"
for (i in 1:100)
{
a<- Sys.time()
mean(DT[DT$SEX==1,]$pwgtp15); mean(DT[DT$SEX==2,]$pwgtp15)
b<-Sys.time()
myTime<-b-a
counter<- counter + myTime
}
cat("counter is: ", counter, "myName is: ", myName, "\n")
counter<- 0
myName<-"sapply(split(DT$pwgtp15,DT$SEX),mean)"
for (i in 1:100)
{
a<- Sys.time()
sapply(split(DT$pwgtp15,DT$SEX),mean)
b<-Sys.time()
myTime<-b-a
counter<- counter + myTime
}
cat("counter is: ", counter, "myName is: ", myName, "\n")
counter<- 0
myName<-"tapply(DT$pwgtp15, DT$SEX, mean)"
for (i in 1:100)
{
a<- Sys.time()
tapply(DT$pwgtp15, DT$SEX, mean)
b<-Sys.time()
myTime<-b-a
counter<- counter + myTime
}
cat("counter is: ", counter, "myName is: ", myName, "\n")
counter<- 0
myName<-"mean(DT$pwgtp15, by=DT$SEX)"
for (i in 1:100)
{
a<- Sys.time()
mean(DT$pwgtp15, by=DT$SEX)
b<-Sys.time()
myTime<- b-a
counter<- counter + myTime
}
cat("counter is: ", counter, "myName is: ", myName, "\n")
#We convert the entire DATAFRAME to numeric
#Otherwise rowmeans will not work
DT <- data.frame(data.matrix(DT))
counter<- 0
myName<-"rowMeans(DT)[DT$SEX==1];rowMeans(DT)[DT$SEX==2]"
for (i in 1:100)
{
a<- Sys.time()
rowMeans(DT)[DT$SEX==1];rowMeans(DT)[DT$SEX==2]
b<-Sys.time()
myTime<- b-a
counter<- counter + myTime
}
cat("counter is: ", counter, "myName is: ", myName, "\n")