how should I write max group number for all member of that group?

Question

I have a dataframe like this :

     Id relationship age
1  1001            1  60
2  1001            2  50
3  1001            3  20
4  1002            1  70
5  1002            2  68
6  1002            3  23
7  1002            3  27
8  1002            3  27
9  1002            3  23
10 1003            1  60
11 1003            2  40
12 1003            3  20
13 1003            3  20

I want to write big age of each Id for all member of same Id in new column and name it maxage. I need this result:

     Id relationship age maxage
1  1001            1  60     60
2  1001            2  50     60
3  1001            3  20     60
4  1002            1  70     70
5  1002            2  68     70
6  1002            3  23     70
7  1002            3  27     70
8  1002            3  27     70
9  1002            3  23     70
10 1003            1  60     60
11 1003            2  40     60
12 1003            3  20     60
13 1003            3  20     60

when I use this command R says:Error: cannot allocate vector of size 1.5 Gb In addition: Warning messages: 1: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), : Reached total allocation of 4076Mb: see help(memory.size) 2: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), : Reached total allocation of 4076Mb: see help(memory.size) 3: In `[.data.frame`(x, c(m$xi, if (all.x) m$x.alone), c(by.x, seq_len(ncx)[-by.x]), : Reached total allocation of 4076Mb: see help(memory.size) — user3041372, Jan 21 '14 at 13:16
The code definitely works ok on small data. Sounds like you've run out of memory. Try starting a fresh R session if possible, or take @jlhoward's advice and use `data.table` to avoid copying and speed things up. — thelatemail, Jan 22 '14 at 02:06

jlhoward · Answer 1 · 2014-01-21T22:27:31.427

If your dataframe is df, then

result <- aggregate(age~Id, df, max)
df <- merge(df,result,by="Id")
colnames(df)[3:4] <- c("age","max.age")
df
#      Id relationship age max.age
# 1  1001            1  60      60
# 2  1001            2  50      60
# 3  1001            3  20      60
# 4  1002            1  70      70
# 5  1002            2  68      70
# 6  1002            3  23      70
# 7  1002            3  27      70
# 8  1002            3  27      70
# 9  1002            3  23      70
# 10 1003            1  60      60
# 11 1003            2  40      60
# 12 1003            3  20      60
# 13 1003            3  20      60

You can also do this with data.tables, which I would recommend actually because it's simpler and faster.

library(data.table)
dt <- data.table(df)
dt[,max.age:=max(age),by=Id]
# head(dt)
# 1: 1001            1  60      60
# 2: 1001            2  50      60
# 3: 1001            3  20      60
# 4: 1002            1  70      70
# 5: 1002            2  68      70
# 6: 1002            3  23      70

I need to write this max in a new column in dataframe that I wrote it above but this command write it in a new dataframe. so it dosen't seem well . — user3041372, Jan 21 '14 at 13:12

score 0 · Answer 2 · answered Jan 21 '14 at 07:41

0

Another option would be

> library(plyr)
> 
> ddply(ages, .(Id), function(df) {df$max.age = max(df$age); df})
     Id relationship age max.age
1  1001            1  60      60
2  1001            2  50      60
3  1001            3  20      60
4  1002            1  70      70
5  1002            2  68      70
6  1002            3  23      70
7  1002            3  27      70
8  1002            3  27      70
9  1002            3  23      70
10 1003            1  60      60
11 1003            2  40      60
12 1003            3  20      60
13 1003            3  20      60

answered Jan 21 '14 at 07:41

datawookie

1,607
12
20

I think this command is good idea but it dose not work completely in my system. i think it has a problem. it dosen't make a new column and write max.age in it! what should I do ? – user3041372 Jan 21 '14 at 13:16
please help me. I am in a bad situation and I have very little time:( this work like aggregate(a~b) and it makes new dataframe and write max.age in it but I need max.age write in above dataframe .another problem is that this command very slow and my dataset is very large – user3041372 Jan 21 '14 at 14:44
sure, i am happy to help. i am not sure what the problem is though... as far as i can tell you are wanting a new column with the maximum age per group. my solution above does this. so i am not sure what else you need. can you please update your original questions showing what your real data looks like? you can just do dput(head(ages)), presuming that your data are called "ages". – datawookie Jan 22 '14 at 07:22
if it is just a question of having "maxage" rather than "max.age" as the column name, simply make the change in my code. – datawookie Jan 22 '14 at 07:23

how should I write max group number for all member of that group?

2 Answers2