0

I'm trying to make function to return statistics by group like followings. To make statistics by group, I used subset in this function code. But errors occurred when argument 'y' is applied to 'subset'. How can I solve this problem? I'll wait wisdom of yours. It's sure that tapply can be used, but my intention is to make function. Thank you.

sbyg<-function(dt,grp,y) {
# dt=data.frame, grp=group variable, y=value variable
ng<-length(unique(grp))
x<-as.vector(unique(grp))
statis<-matrix(nrow=ng,ncol=6)
for (i in 1:ng) {
  dta<-dt[grp==x[i],]
  attach(dta)
  statis[i,1]<-nrow(dta) # 건수
  statis[i,2]<-colSums(!is.na(dta))[1] # nonmiss건수
  statis[i,3]<-mean(dta[,y],na.rm=TRUE) # 평균
  statis[i,4]<-median(dta[,y],na.rm=TRUE) # 중위수
  statis[i,5]<-min(dta[,y],na.rm=TRUE)
  statis[i,6]<-max(dta[,y],na.rm=TRUE)
  detach(dta)
}
rownames(statis)<-x
colnames(statis)<-c("count","nonmiss","mean","median","min","max")
statis
}

sbyg(iris,Species,Sepal.Length)  # error occurs
hyunwoo jeong
  • 1,534
  • 1
  • 15
  • 14
  • You can use one of the group by functions to do this For example with `aggregate` or `data.table` or `ddplyr` – akrun Feb 04 '16 at 11:20
  • idem as [here](https://stackoverflow.com/questions/9847054/how-to-get-summary-statistics-by-group) – YCR Feb 04 '16 at 11:24

1 Answers1

1

Your call of the function should be this:

sbyg(iris,"Species","Sepal.Length")

unless iris is a data.table object (which is not by default).

edit: Modified function:

sbyg<-function(dt,grp,y) {
  # dt=iris ; grp="Species"; y="Sepal.Length"
  ng<-length(unique(dt[, grp]))
  x<-as.vector(unique(dt[, grp]))
  statis<-matrix(nrow=ng,ncol=6)
  for (i in 1:ng) { # i <- 1
    dta<-dt[dt[, grp]==x[i],]
    statis[i,1]<-nrow(dta) # 건수
    statis[i,2]<-colSums(!is.na(dta))[1] # nonmiss건수
    statis[i,3]<-mean(dta[,y],na.rm=TRUE) # 평균
    statis[i,4]<-median(dta[,y],na.rm=TRUE) # 중위수
    statis[i,5]<-min(dta[,y],na.rm=TRUE)
    statis[i,6]<-max(dta[,y],na.rm=TRUE)
  }
  rownames(statis)<-x
  colnames(statis)<-c("count","nonmiss","mean","median","min","max")
  statis
}

But this is not an optimal function. The best way is to use tapply.

YCR
  • 3,794
  • 3
  • 25
  • 29
  • Thank you for your concise answer, YCR quotation mark(") almost solved my problem. but, 'by group' doesn't work. I wonder what's the problem. – hyunwoo jeong Feb 05 '16 at 00:34