0

I have a dataframe like this :

     Id relationship age
1  1001            1  60
2  1001            2  50
3  1001            3  20
4  1002            1  70
5  1002            2  68
6  1002            3  23
7  1002            3  27
8  1002            3  27
9  1002            3  23
10 1003            1  60
11 1003            2  40
12 1003            3  20
13 1003            3  20

I want to write big age of each Id for all member of same Id in new column and name it maxage. I need this result:

     Id relationship age maxage
1  1001            1  60     60
2  1001            2  50     60
3  1001            3  20     60
4  1002            1  70     70
5  1002            2  68     70
6  1002            3  23     70
7  1002            3  27     70
8  1002            3  27     70
9  1002            3  23     70
10 1003            1  60     60
11 1003            2  40     60
12 1003            3  20     60
13 1003            3  20     60
thelatemail
  • 91,185
  • 12
  • 128
  • 188
user3041372
  • 31
  • 1
  • 4
  • when I use this command R says:Error: cannot allocate vector of size 1.5 Gb In addition: Warning messages: 1: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), : Reached total allocation of 4076Mb: see help(memory.size) 2: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), : Reached total allocation of 4076Mb: see help(memory.size) 3: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), : Reached total allocation of 4076Mb: see help(memory.size) – user3041372 Jan 21 '14 at 13:16
  • 1
    The code definitely works ok on small data. Sounds like you've run out of memory. Try starting a fresh R session if possible, or take @jlhoward's advice and use `data.table` to avoid copying and speed things up. – thelatemail Jan 22 '14 at 02:06

2 Answers2

2

If your dataframe is df, then

result <- aggregate(age~Id, df, max)
df <- merge(df,result,by="Id")
colnames(df)[3:4] <- c("age","max.age")
df
#      Id relationship age max.age
# 1  1001            1  60      60
# 2  1001            2  50      60
# 3  1001            3  20      60
# 4  1002            1  70      70
# 5  1002            2  68      70
# 6  1002            3  23      70
# 7  1002            3  27      70
# 8  1002            3  27      70
# 9  1002            3  23      70
# 10 1003            1  60      60
# 11 1003            2  40      60
# 12 1003            3  20      60
# 13 1003            3  20      60

You can also do this with data.tables, which I would recommend actually because it's simpler and faster.

library(data.table)
dt <- data.table(df)
dt[,max.age:=max(age),by=Id]
# head(dt)
# 1: 1001            1  60      60
# 2: 1001            2  50      60
# 3: 1001            3  20      60
# 4: 1002            1  70      70
# 5: 1002            2  68      70
# 6: 1002            3  23      70
jlhoward
  • 58,004
  • 7
  • 97
  • 140
0

Another option would be

> library(plyr)
> 
> ddply(ages, .(Id), function(df) {df$max.age = max(df$age); df})
     Id relationship age max.age
1  1001            1  60      60
2  1001            2  50      60
3  1001            3  20      60
4  1002            1  70      70
5  1002            2  68      70
6  1002            3  23      70
7  1002            3  27      70
8  1002            3  27      70
9  1002            3  23      70
10 1003            1  60      60
11 1003            2  40      60
12 1003            3  20      60
13 1003            3  20      60
datawookie
  • 1,607
  • 12
  • 20
  • I think this command is good idea but it dose not work completely in my system. i think it has a problem. it dosen't make a new column and write max.age in it! what should I do ? – user3041372 Jan 21 '14 at 13:16
  • please help me. I am in a bad situation and I have very little time:( this work like aggregate(a~b) and it makes new dataframe and write max.age in it but I need max.age write in above dataframe .another problem is that this command very slow and my dataset is very large – user3041372 Jan 21 '14 at 14:44
  • sure, i am happy to help. i am not sure what the problem is though... as far as i can tell you are wanting a new column with the maximum age per group. my solution above does this. so i am not sure what else you need. can you please update your original questions showing what your real data looks like? you can just do dput(head(ages)), presuming that your data are called "ages". – datawookie Jan 22 '14 at 07:22
  • if it is just a question of having "maxage" rather than "max.age" as the column name, simply make the change in my code. – datawookie Jan 22 '14 at 07:23