4

I have a table which I need to populate with mean. I am currently using inefficient code that will take a long time on large data sets. Example:

Sample Data:

x = read.table(text="a b value mean
                     1 1 10 0
                     1 1 12 0
                     2 2 14 0
                     2 1 16 0", header=TRUE)

Code:

y <- aggregate(x$value, list(a = x$a,b = x$b), mean)
print(y)
#   a b  x
# 1 1 1 11
# 2 2 1 16
# 3 2 2 14

for (i in 1:4) {
  for (j in 1:3) {
    if (x$a[i]==y$a[j] && x$b[i]==y$b[j]) {
      x$mean[i]=y$x[j] }
  }
}
print(x) # This is the final output
#   a b value mean
# 1 1 1    10   11
# 2 1 1    12   11
# 3 2 2    14   14
# 4 2 1    16   16

I want to be able to get from the input to the output with efficient code. I am new to R so many thanks for helping out!

kdauria
  • 6,300
  • 4
  • 34
  • 53
  • Can you explain why you feel that it's inefficient and what you have tried to make it more efficient? – dethtron5000 Jan 21 '14 at 18:44
  • 1
    Questions about improving the efficiency of working code may be more appropriate on Code Review (http://codereview.stackexchange.com/) – Brian Diggs Jan 21 '14 at 18:45

3 Answers3

7

data.table is the way to go:

library(data.table)
x.dt <- data.table(x[1:3])               # convert first three cols
x.dt[, mean:=mean(value), by=list(a, b)] # add back mean
#    a b value mean
# 1: 1 1    10   11
# 2: 1 1    12   11
# 3: 2 2    14   14
# 4: 2 1    16   16

data.table is very fast.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
5

You are looking for ave:

x <- transform(x, mean = ave(value, a, b, mean))

#   a b value mean
# 1 1 1    10   11
# 2 1 1    12   11
# 3 2 2    14   14
# 4 2 1    16   16
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
3

The merge function will match on the columns with the same name in x and y (a and b):

x = data.frame(a=c(1, 1, 2, 2), b=c(1, 1, 2, 1), value=c(10, 12, 14, 16))
y = aggregate(x$value, list(a=x$a, b=x$b), mean)
merge(x, y, sort=F)
#   a b value  x
# 1 1 1    10 11
# 2 1 1    12 11
# 3 2 2    14 14
# 4 2 1    16 16
josliber
  • 43,891
  • 12
  • 98
  • 133