4
df <- data.frame(
id = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
value = c(4,3,1,3,4,6,6,1,8,4))

I want to get max value within each id group. I tried following but got an error saying replacement has 4 rows and data has 10 which i understand but don't know how to correct

df$max.by.id <- aggregate(value ~ id, df, max)  

this is how i ended up successfully doing it

max.by.id <- aggregate(value ~ id, df, max)  
names(max.by.id) <- c("id", "max")
df2 <- merge(df,max.by.id, by.x = "id", by.y = "id")
df2
#   id value max
#1  A1     4   8
#2  A1     4   8
#3  A1     8   8
#4  A2     3   3
#5  A2     3   3
#6  A2     1   3
#7  A3     6   6
#8  A3     4   6
#9  A4     1   6
#10 A4     6   6

any better way? thanks in advance

Cath
  • 23,906
  • 5
  • 52
  • 86
seakyourpeak
  • 531
  • 1
  • 6
  • 18
  • do you need the result data.frame to be ordered by id ? – Cath Dec 17 '15 at 15:08
  • you should look at the object that `aggregate(value ~ id, df, max)` outputs before trying to add it as a column to your data frame – rawr Dec 17 '15 at 15:11
  • CathG suggestion that I should first look at aggregate(value ~ id, df, max) is helpful. the expression aggregate(value ~ id, df, max) works fine. Its just that its assignment to a new variable within df does not work, which makes sense as df has different length as compared to the results of aggregate(value ~ id, df, max). – seakyourpeak Dec 21 '15 at 23:49

3 Answers3

7

ave() is the function for that task:

df$max.by.id <- ave(df$value, df$id, FUN=max) 

example:

df <- data.frame(
  id = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
  value = c(4,3,1,3,4,6,6,1,8,4))

df$max.by.id <- ave(df$value, df$id, FUN=max) 

The result of ave() has the same length as the original vector of values (what is also the length of the grouping variables). The values of the result are going to the right positions with respect to the grouping variables. For more information read the documentation of ave().

jogo
  • 12,469
  • 11
  • 37
  • 42
  • ave() works perfect! Thanks jogo. Not only your solution works but I understood why my approach gave an error. – seakyourpeak Dec 21 '15 at 23:50
  • So eventually you want to accept this answer (or any other) by clicking next to the voting. – jogo Dec 22 '15 at 07:42
  • I believe I need at list 150 reputations to be able to vote. I am new to r and stack overflow and have only 41 so far. I eventually will vote for your answer when I get voting rights. You introduced me to ave function, which is very handy and I have already used it in other contexts. THANKS!!! – seakyourpeak Dec 22 '15 at 14:06
5

with data.table, you can compute the max by id "inside" the data, automatically adding the newly computed value (unique by id):

library(data.table)
setDT(df)[, max.by.id := max(value), by=id]
df
#    id value max.by.id
# 1: A1     4         8
# 2: A2     3         3
# 3: A4     1         6
# 4: A2     3         3
# 5: A1     4         8
# 6: A4     6         6
# 7: A3     6         6
# 8: A2     1         3
# 9: A1     8         8
#10: A3     4         6
Cath
  • 23,906
  • 5
  • 52
  • 86
  • 1
    @akrun maybe it is, and was just not obvious for me. I don't know the whole SO site by heart. We're a community and supposed to work together. Anyone who recognize a Q as dupe mark it as such. You're around SO far more than me so it's normal that you recognize dupe better than I do. It would make a lot of people merry to see you closing dupes instead of answering them with an old answer from yours just to get even more rep – Cath Jan 26 '16 at 13:44
  • I am not going to close this one, but the way in which you targeted regex solution is different. In regex, even a single character matter, you can ask AvinashRaj. Regarding the regex dupe, it was not obvious to me too. – akrun Jan 26 '16 at 13:46
  • @akrun dupes don't have to be the exact same thing, the OP can have a little work to do to adapt it. I'm not gonna spread on the subject, Tensibai explained it better than I would and I totally agree with him – Cath Jan 26 '16 at 13:48
2
tapply(df$value, df$id, max)
# A1 A2 A3 A4 
  8  3  6  6 

library(plyr)
ddply(df, .(id), function(df){max(df$value)})
#   id V1
# 1 A1  8
# 2 A2  3
# 3 A3  6
# 4 A4  6

library(dplyr)
df %>% group_by(id) %>% arrange(desc(value)) %>% do(head(., 1))
# Source: local data frame [4 x 2]
# Groups: id [4]

#       id value
#   (fctr) (dbl)
# 1     A1     8
# 2     A2     3
# 3     A3     6
# 4     A4     6

UPDATE: If you need to keep the raw value, use the following code.

library(plyr)
ddply(df, .(id), function(df){
  df$max.val = max(df$value)
  return(df)
})

library(dplyr)
df %>% group_by(id) %>% mutate(max.val=max(value))
# Source: local data frame [10 x 3]
# Groups: id [4]

#        id value max.val
#    (fctr) (dbl)   (dbl)
# 1      A1     4       8
# 2      A2     3       3
# 3      A4     1       6
# 4      A2     3       3
# 5      A1     4       8
# 6      A4     6       6
# 7      A3     6       6
# 8      A2     1       3
# 9      A1     8       8
# 10     A3     4       6
Ven Yao
  • 3,680
  • 2
  • 27
  • 42