0

Using R, I would like to create a new column (MaxAct) showing the maximum numbers of a different column (ActNo) while grouping by two factors (HHID and PERID)

For example, I have this data set:

UID HHID PERID ActNo
1   1000 1     1
2   1000 1     2
3   1000 1     3
4   1000 2     1
5   1000 2     2
6   2000 1     1
7   2000 1     2
8   2000 1     3
9   2000 1     4
10  2000 2     1
11  2000 2     2

Then I want to add the new column (MaxAct) as follows:

UID HHID PERID ActNo MaxAct
1   1000 1     1     3
2   1000 1     2     3
3   1000 1     3     3
4   1000 2     1     2
5   1000 2     2     2
6   2000 1     1     4
7   2000 1     2     4
8   2000 1     3     4
9   2000 1     4     4
10  2000 2     1     2
11  2000 2     2     2
mnel
  • 113,303
  • 27
  • 265
  • 254
POTENZA
  • 1,377
  • 3
  • 17
  • 20
  • It's much easier for us to help you if you post a working example along with your question. Also, the command `dput()` would properly help you in making your working example. – Eric Fail Sep 11 '12 at 22:56
  • If this remains unsolved, your really _do_ need to use dput() to provide an illustration that has all of the aspects of your problem. Just posting what might be a console output often fails to represent the internal structure of you data object. – IRTFM Sep 12 '12 at 16:16

4 Answers4

6
dat$MaxAct <- with(dat, ave(ActNo, HHID, PERID, FUN=max) )

For problems involving single vectors and grouping where you want the length of the result to equal the row count, ave is your function of choice. For more complicated problems, the lapply(split(dat, fac), FUN) approach may be needed or use do.call(rbind, by( ...))

If you have missing values:

dat$MaxAct <- with(dat, ave(ActNo, HHID, PERID, FUN=function(x) max(x, na.rm=TRUE) )  )
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you very much, DWin... But I have an error message like "no non-missing arguments to max" Do you have any suggestions? – POTENZA Sep 11 '12 at 23:02
  • It works for me, but if your data has missing values (which your illustration did not suggest), then adding na.rm=TRUE as a further argument to `max` would be needed. – IRTFM Sep 11 '12 at 23:13
  • 2
    @DWin, the error "no non-missing arguments to max" is not thrown because of the presence of NA values. But because he is applying max to a NULL or 0 length argument: http://r.789695.n4.nabble.com/error-message-re-max-i-but-code-and-output-seen-O-K-td891889.html – Eduardo May 29 '13 at 13:45
6

This is standard fare for plyr with mutate or transform, base R ave or data.table (which might be considered a sledgehammer for a peanuts here).

The plyr and ave approaches has been addressed so

data.table

library(data.table)
DT <- data.table(DF)
DT[,MaxAct := max(ActNo), by = list(HHID, PERID)]

Given the size of the data the memory efficient and fast nature of data.table is perhaps not required.

having read your previous question How to Create a Column of Ranks While Grouping in R, so we know that max(ActNo) is simply the number of rows in each group then

DT[,MaxAct := .N, by = list(HHID, PERID)]

will work, and be marginally quicker.

Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
  • The ave approach was first, but it's so quick and elegant that you probably missed it. – IRTFM Sep 11 '12 at 23:10
  • 1
    Yarp, I've edited my response. Ave is elegant, I am starting to think data.table is elegant. Perhaps I should get out more. – mnel Sep 11 '12 at 23:11
4

There are several approaches in R to do achieve this task. For me, the easiest way to do this is to use the plyr package

require(plyr)
ddply(dat, .(HHID, PERID), transform, MaxAct = max(ActNo))

   UID HHID PERID ActNo MaxAct
1    1 1000     1     1      3
2    2 1000     1     2      3
3    3 1000     1     3      3
4    4 1000     2     1      2
5    5 1000     2     2      2
6    6 2000     1     1      4
7    7 2000     1     2      4
8    8 2000     1     3      4
9    9 2000     1     4      4
10  10 2000     2     1      2
11  11 2000     2     2      2
dickoa
  • 18,217
  • 3
  • 36
  • 50
1
df <- read.table(textConnection("UID HHID PERID ActNo
1   1000 1     1
2   1000 1     2
3   1000 1     3
4   1000 2     1
5   1000 2     2
6   2000 1     1
7   2000 1     2
8   2000 1     3
9   2000 1     4
10  2000 2     1
11  2000 2     2"), header=T)


> ddply(df, .(HHID, PERID), transform, MaxAct = length(unique(ActNo)) )
   UID HHID PERID ActNo MaxAct
1    1 1000     1     1      3
2    2 1000     1     2      3
3    3 1000     1     3      3
4    4 1000     2     1      2
5    5 1000     2     2      2
6    6 2000     1     1      4
7    7 2000     1     2      4
8    8 2000     1     3      4
9    9 2000     1     4      4
10  10 2000     2     1      2
11  11 2000     2     2      2
Maiasaura
  • 32,226
  • 27
  • 104
  • 108