3

I would like to add tapply results to the original data frame as a new column.

Here is my data frame:

 dat <- read.table(text = " category birds    wolfs     snakes
                   yes        3        9         7
                   no         3        8         4
                   no         1        2         8
                   yes        1        2         3
                   yes        1        8         3
                   no         6        1         2
                   yes        6        7         1
                   no         6        1         5
                   yes        5        9         7
                   no         3        8         7
                   no         4        2         7
                   notsure    1        2         3
                   notsure    7        6         3
                   no         6        1         1
                   notsure    6        3         9
                   no         6        1         1   ",header = TRUE)

I would like to to add the mean of each category to the data frame as a column. I used: tapply(dat$birds, dat$category, mean) to get the mean per category but I didn't find a way to add it to the data set in such away that in a new column I'll have the mean of the relevant category.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
migdal menora
  • 169
  • 4
  • 16

3 Answers3

6

You can use ave from base

  dat$mbirds <- with(dat, ave(birds, category, FUN=mean))

If you want to use tapply

  mbirds1 <- with(dat, tapply(birds, category, mean))
  dat$mbirds1 <- mbirds1[match(dat$category,names(mbirds1))]

  head(dat)
  #  category birds wolfs snakes mbirds mbirds1
 #1      yes     3     9      7  3.200   3.200
 #2       no     3     8      4  4.375   4.375
 #3       no     1     2      8  4.375   4.375
 #4      yes     1     2      3  3.200   3.200
 #5      yes     1     8      3  3.200   3.200
 #6       no     6     1      2  4.375   4.375

Or you can use data.table which would be fast

 library(data.table)
 setDT(dat)[,mbirds1:= mean(birds), by=category]
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Here's an aggregate answer. Using a formula in its arguments makes it nice and simple.

> a <- aggregate(birds~category, dat, mean)
> cb <- cbind(dat, mean = a[,2][match(dat[[1]], a[,1])])
> head(cb)
#  category birds wolfs snakes  mean
#1      yes     3     9      7 3.200
#2       no     3     8      4 4.375
#3       no     1     2      8 4.375
#4      yes     1     2      3 3.200
#5      yes     1     8      3 3.200
#6       no     6     1      2 4.375
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
2

You can achieve that easily with dplyr package like this

dat <- dat %>% group_by(category) %>% mutate(mbirds=mean(birds))

More information about dplyr package can be found here.

You can find approaches with other packages in akrun's answer.

iugrina
  • 605
  • 4
  • 7
  • Thanks a lot @iugrina but when I write names(dat) I don't get the new variable "mbirds". How can I add it into the original data frame?I have two other questions: What is the meaning of %>% ? Also,do you have any Idea how to do this without the need of the dplyr package ? – migdal menora Sep 01 '14 at 11:45
  • I will adjust my answer – iugrina Sep 01 '14 at 11:52