5

I would like to use R to create a new column in my dataset that includes a maximum for each unique group. My data look like this:

group<-c("A","A","A","A","A","B","B","C","C","C")
replicate<-c(1,2,3,4,5,1,2,1,2,3)
x<-data.frame(cbind(group,replicate))

I'd like to create the third column as shown below - the maximum for each group.

group   replicate max.per.group
A       1         5       
A       2         5
A       3         5
A       4         5
A       5         5
B       1         2
B       2         2
C       1         3
C       2         3
C       3         3
Frank
  • 66,179
  • 8
  • 96
  • 180
Luke
  • 4,769
  • 5
  • 28
  • 36
  • @akrun That uses mean while this one has max: http://stackoverflow.com/questions/35617665/add-max-value-to-a-new-column-in-r (It is also marked as a dupe, but it's target has a bad title.) Or this one: http://stackoverflow.com/q/12379043/ though it has package tags. – Frank Feb 15 '17 at 06:27
  • @Frank Also, I feel the older question should not be marked as duplicate with newer ones. It should be vice versa. – Ronak Shah Feb 15 '17 at 06:32
  • @Frank That looks like more appropriate dupe target, though I would not close it as it is a newer one – akrun Feb 15 '17 at 06:40
  • @RonakShah I generally agree, but if the newer question has far superior answers, then I think it's okay to consider the older one the duplicate, just so that people get pointed towards the best answers. – pattivacek Feb 17 '17 at 14:25

7 Answers7

6

If you redefine x first (the cbind makes both columns factors),

x<-data.frame(group,replicate)

you can use this:

merge(x,aggregate(replicate~group,x,FUN=max),all.x=TRUE,by="group")
   group replicate.x replicate.y
1      A           1           5
2      A           2           5
3      A           3           5
4      A           4           5
5      A           5           5
6      B           1           2
7      B           2           2
8      C           1           3
9      C           2           3
10     C           3           3
James
  • 65,548
  • 14
  • 155
  • 193
3

Try

# This is how you create your data.frame
group<-c("A","A","A","A","A","B","B","C","C","C")
replicate<-c(1,2,3,4,5,1,2,1,2,3)
x<-data.frame(group,replicate) # here you don't need c()

# Here's my solution    
Max <- tapply(x$replicate, x$group,max)
data.frame(x, max.per.group=rep(Max, table(x$group)))
 group replicate max.per.group
1      A         1             5
2      A         2             5
3      A         3             5
4      A         4             5
5      A         5             5
6      B         1             2
7      B         2             2
8      C         1             3
9      C         2             3
10     C         3             3
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
3

Here is an other base R solution:

cbind(x, cummax=unlist(tapply(x$replicate, x$group, function(x) rep(max(x), length(x)))))
   group replicate cummax
A1     A         1      5
A2     A         2      5
A3     A         3      5
A4     A         4      5
A5     A         5      5
B1     B         1      2
B2     B         2      2
C1     C         1      3
C2     C         2      3
C3     C         3      3
johannes
  • 14,043
  • 5
  • 40
  • 51
3

The shared reproducible example suggests that you have the columns as factors. We need to convert them to numeric first.

We can try with base R ave. Finding out the max in each group.

x$max.per.group <- ave(x$replicate, x$group, FUN = function(x) max(as.numeric(x)))

#   group replicate max.per.group
#1      A         1             5
#2      A         2             5
#3      A         3             5
#4      A         4             5
#5      A         5             5
#6      B         1             2
#7      B         2             2
#8      C         1             3
#9      C         2             3
#10     C         3             3

Another option with dplyr

library(dplyr)
x %>%
   group_by(group) %>%
   mutate(max.per.group = max(as.numeric(replicate)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

you can use the plyr package:

library(plyr)
> ddply(x, .(group), transform, max.per.group=max(replicate))
   group replicate max.per.group
1      A         1             5
2      A         2             5
3      A         3             5
4      A         4             5
5      A         5             5
6      B         1             2
7      B         2             2
8      C         1             3
9      C         2             3
10     C         3             3
> 
Justin
  • 42,475
  • 9
  • 93
  • 111
1

you can use rle - Run Length Encoding

# Create the data.frame
group <- c("A","A","A","A","A","B","B","C","C","C")
replicate <- c(1,2,3,4,5,1,2,1,2,3)
x <- data.frame(group,replicate)

# using 'rle'
z <- rle(as.numeric(x$group))$lengths
x$max.per.group <- rep(z, z)
x
  • This assumes that `replicate` starts at 1 for each group and increments by 1 for each subsequent entry. This assumption is true for the example (and likely for the more general problem), but it need not be and some of the other answers do not assume it. – Brian Diggs Jul 26 '12 at 21:54
0

We can use data.table assignment (:=) to create a column in place without copying

library(data.table)
setDT(x)[, max.per.group := max(replicate), by = group]
x
#     group replicate max.per.group
#1:     A         1             5
#2:     A         2             5
#3:     A         3             5
#4:     A         4             5
#5:     A         5             5
#6:     B         1             2
#7:     B         2             2
#8:     C         1             3
#9:     C         2             3
#10:    C         3             3

data

x <- data.frame(group,replicate)
akrun
  • 874,273
  • 37
  • 540
  • 662