1

R Community: I am trying to to create a new variable based on the value of existing variable, not on a row-wise basis but rather on a group-wise basis. I'm trying to create max.var and min.var below based on old.var without collapsing or aggregating the rows, that is, preserving all the id rows:

id  old.var min.var max.var
1   1   1   3
1   2   1   3
1   3   1   3
2   5   5   11
2   7   5   11
2   9   5   11
2   11  5   11
3   3   3   4
3   4   3   4

structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), old.var = 
c(1L, 
2L, 3L, 5L, 7L, 9L, 11L, 3L, 4L), min.var = c(1L, 1L, 1L, 5L, 
5L, 5L, 5L, 3L, 3L), max.var = c(3L, 3L, 3L, 11L, 11L, 11L, 11L, 
4L, 4L)), .Names = c("id", "old.var", "min.var", "max.var"), class = "data.frame", row.names = c(NA, 
-9L))

I've tried using the aggregate and by functions, but they of course summarize the data. I haven't had much luck trying an Excel-like MATCH/INDEX approach either. Thanks in advance for your assistance!

user2621147
  • 111
  • 3
  • 10

3 Answers3

3

You can use dplyr,

df %>% 
   group_by(id) %>%
   mutate(min.var = min(old.var), max.var = max(old.var))

#Source: local data frame [9 x 4]
#Groups: id [3]

#     id old.var min.var max.var
#  (int)   (int)   (int)   (int)
#1     1       1       1       3
#2     1       2       1       3
#3     1       3       1       3
#4     2       5       5      11
#5     2       7       5      11
#6     2       9       5      11
#7     2      11       5      11
#8     3       3       3       4
#9     3       4       3       4
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Sotos
  • 51,121
  • 6
  • 32
  • 66
2

We can use data.table

library(data.table)
setDT(df1)[, c('min.var', 'max.var') := list(min(old.var), max(old.var)) , by = id]
df1
#   id old.var min.var max.var
#1:  1       1       1       3
#2:  1       2       1       3
#3:  1       3       1       3
#4:  2       5       5      11
#5:  2       7       5      11
#6:  2       9       5      11
#7:  2      11       5      11
#8:  3       3       3       4
#9:  3       4       3       4
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Using ave as docendo discimus pointed out in the question's comments:

df$min.var <- ave(df$old.var, df$id, FUN = min)
df$max.var <- ave(df$old.var, df$id, FUN = max)

Output:

  id old.var min.var max.var
1  1       1       1       3
2  1       2       1       3
3  1       3       1       3
4  2       5       5      11
5  2       7       5      11
6  2       9       5      11
7  2      11       5      11
8  3       3       3       4
9  3       4       3       4
mpalanco
  • 12,960
  • 2
  • 59
  • 67
  • I wanted to thank everyone. I _knew_ I couldn't be the first person with this question: I just wasn't using the right SO search terms. I selected mpalanco 's answer because I couldn't select the first answer from @docendo_discimus referenced here, and because it's the easiest for R newbies to understand, I think. – user2621147 May 27 '16 at 13:09