0

I have a data frame

DF <- data.frame(y1=c("AG","AG","AI","AI","AG","AI"),
      y0=c(2,2,1,1,2,1),
      y3=c(1994,1996,1997,1999,1994,1994),y4=c("AA","FB","AA","EB","AA","EB"),
      mw3wuus=c(26,34,22,21,65,78),
      Country_true=c("Antigua and  Barbuda","Antigua and  Barbuda","Anguilla","Anguilla","Antigua and  Barbuda","Anguilla"))

 DF
  y1 y0   y3 y4 mw3wuus         Country_true
1 AG  2 1994 AA      26 Antigua and  Barbuda
2 AG  2 1996 FB      34 Antigua and  Barbuda
3 AI  1 1997 AA      22             Anguilla
4 AI  1 1999 EB      21             Anguilla
5 AG  2 1994 AA      65 Antigua and  Barbuda
6 AI  1 1994 EB      78             Anguilla

And I'm trying to create a new column with a mean variable based on the fact that the other columns are equal

For instance, in the example, everything must be the same but rows 5 and 1, for which I need to calculate the mean of mw3wuus, as they have the same values for y1, y0, y3, and y4.

nico
  • 50,859
  • 17
  • 87
  • 112
Dima Sukhorukov
  • 129
  • 4
  • 13

3 Answers3

5

You may want to play around with aggregate

For instance:

aggregate(DF$mw3wuus, FUN=mean, 
          by=list(y1=DF$y1, y0=DF$y0, y3=DF$y3, y4=DF$y4))

Will give you:

  y1 y0   y3 y4    x
1 AG  2 1994 AA 45.5
2 AI  1 1997 AA 22.0
3 AI  1 1994 EB 78.0
4 AI  1 1999 EB 21.0
5 AG  2 1996 FB 34.0
nico
  • 50,859
  • 17
  • 87
  • 112
  • My guess is OP looking for `with(DF, ave(mw3wuus, y1, y0, y3, y4, FUN = mean))` instead of `aggregate`... – David Arenburg Feb 10 '15 at 23:01
  • @DavidArenburg possible... but why keeping a duplicate row if you want the mean? – nico Feb 10 '15 at 23:03
  • Because they said the are trying to create a new column instead of aggregate the whole data set, but I may be wrong... – David Arenburg Feb 10 '15 at 23:04
  • 1
    @DavidArenburg fair point, let's see what the OP says, the question is not particularly clear on this point :) – nico Feb 10 '15 at 23:05
  • working nice, I was looking to manage my data to simplify it, to aggregate data I can use duplicated)) Thanks you all, now can go sleep with a calm soul ) – Dima Sukhorukov Feb 10 '15 at 23:28
3

Using data.table

library(data.table)
setDT(DF)[, Mean := mean(mw3wuus), by = .(y1, y0, y3, y4)][]
#    y1 y0   y3 y4 mw3wuus         Country_true Mean
# 1: AG  2 1994 AA      26 Antigua and  Barbuda 45.5
# 2: AG  2 1996 FB      34 Antigua and  Barbuda 34.0
# 3: AI  1 1997 AA      22             Anguilla 22.0
# 4: AI  1 1999 EB      21             Anguilla 21.0
# 5: AG  2 1994 AA      65 Antigua and  Barbuda 45.5
# 6: AI  1 1994 EB      78             Anguilla 78.0
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
2

Or using the dplyr package:

library(dplyr)
DF %>% group_by(y1,y0,y3,y4) %>% summarise (x = mean(mw3wuus))
Sam Firke
  • 21,571
  • 9
  • 87
  • 105