R - get the most common string value (mode) for a given time period

Question

I had hoped to use ddply's mode function to find the most common string for a certain user by time period.

This relates significantly to this question and this question.

Using a data set similar to this:

Data <- data.frame(
    groupname = as.factor(sample(c("red", "green", "blue"), 100, replace = TRUE)),
    timeblock = sample(1:10, 100, replace = TRUE),
    someuser = sample(c("bob", "sally", "sue"), 100, replace = TRUE))

I'd tried:

groupnameagg<- ddply(Data, .(timeblock, groupname, someuser), summarise, groupmode = mode(groupname))

But that's not doing what I had expected. It returns:

> head(groupnameagg$groupname)
[1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"

How can I find the most commonly occurring groupname by user by timeblock? With a result similar to:


    timeblock   username  mostcommongroupforuser
        1          bob     red
        1          sally   red
        1          sue     green
        2          bob     green
        2          sally   blue
        2          sue     red

If groupname is organized by levels, how might I get the highest level present in each timeblock?

In R the 'mode' function returns the storage mode, and the storage mode for factor variables is "numeric". You probably want to use `table` and `sort` and then put out the last one from that result. — IRTFM, Jun 23 '13 at 20:59

score 4 · Accepted Answer · answered Jun 23 '13 at 21:33

Think aggregate should do the trick for both

PART 1

aggregate(Data$groupname,by=list(Data$timeblock,Data$someuser),
     function(x) { 
          ux <- unique(x) 
          ux[which.max(tabulate(match(x, ux)))] })

PART 2

aggregate(Data$groupname,by=list(Data$timeblock,Data$someuser),
     function(x) { 
         levels(Data$groupname)[max(as.numeric(x))] })

agstudy · Answer 2 · 2013-06-23T21:48:45.220

0

Using ddply from plyr

ddply(Data, .(timeblock, groupname, someuser),
  function(x){as.character(
                    unique(
                      x$groupname[x$someuser==
                                    names(which.max(table(x$someuser)))
                        ]
                      )
                    )
              }
  )

    timeblock groupname someuser    V1
1          1      blue      bob  blue
2          1      blue    sally  blue
3          1     green      bob green
.........

edited Jun 23 '13 at 21:48

answered Jun 23 '13 at 21:01

agstudy

119,832
17
199
261

My sample dataframe was misleading. I've edited it to be clearer. – d-cubed Jun 23 '13 at 21:06

R - get the most common string value (mode) for a given time period

2 Answers2