Majority vote in R

Question

I need to calculate the majority vote for an item in R and I don't have a clue how to approach this.

I have a data frame with items and assigned categories. What I need is the category that was assigned the most often. How do I do this?

Data frame:

item   category
1      2
1      3
1      2
1      2
2      2
2      3
2      1
2      1

Result should be:

item   majority_vote
1      2
2      1

take a look at the `table` function and the `plyr` package. However, this is a very common data manipulation and you would likely benefit from reading any of the excellent R tutorials on the "split-apply-combine" strategy of data processing. — Justin, Jun 19 '13 at 21:35
Apologies, I'm away from my pc with R on so can't provide code. I think you're after the mode for an item. In combination with @Justin's answer it should give you what you need. — Steph Locke, Jun 19 '13 at 21:37
Thanks, I'll look at the things you suggested and, of course, at all the other strategies suggested. I'm impressed, I didn't expect that there would be that many ways of approaching this. — nantoki, Jun 20 '13 at 07:54

score 6 · Accepted Answer · answered Jun 19 '13 at 21:51

You could use two things here. First, this is how you get the most frequent item in a vector:

> v = c(1,1,1,2,2)
> names(which.max(table(v)))
[1] "1"

This is a character value, but we can easily to an as.numeric on it if necessary.

Once we know how to do that, we can use the grouping functionality of the data.table package to perform a per-item evaluation of what its most frequent category is. Here is the code for your example above:

> dt = data.table(item=c(1,1,1,1,2,2,2,2), category=c(2,3,2,2,2,3,1,1))
> dt
   item category
1:    1        2
2:    1        3
3:    1        2
4:    1        2
5:    2        2
6:    2        3
7:    2        1
8:    2        1
> dt[,as.numeric(names(which.max(table(category)))),by=item]
   item V1
1:    1  2
2:    2  1

The new V1 column contains the numeric version of the most frequent category for each item. If you want to give it a proper name, the syntax is a little uglier:

> dt[,list(mostFreqCat=as.numeric(names(which.max(table(category))))),by=item]
   item mostFreqCat
1:    1           2
2:    2           1

I'd avoid `table` and would do smth like this instead: `dt[, .SD[, .N, by = category][order(-N)][1], by = item]` — eddi, Jun 19 '13 at 22:00

score 3 · Answer 2 · answered Jun 20 '13 at 04:47

3

One liner (using plyr):

ddply(dt, .(item), function(x) which.max(tabulate(x$category)))

answered Jun 20 '13 at 04:47

topchef

19,091
9
63
102

score 1 · Answer 3 · answered Jun 19 '13 at 22:11

 tdat <- tapply(dat$category, dat$item, function(vec) sort(table(vec), 
                                                 decreasing=TRUE)[1] )
 data.frame(item=rownames(tdat), plurality_vote=tdat)

  item plurality_vote
1    1              3
2    2              2

A more complex function would be needed to distinguish a plurality (possibly with ties) from a true majority.

score 1 · Answer 4 · answered Jun 20 '13 at 01:31

1

If you have a function to calculate the mode, as in package prettyR, you can use aggregate:

require(prettyR)

aggregate(d$category, by=list(item=d$item), FUN=Mode)
#  item x
#1    1 2
#2    2 1

answered Jun 20 '13 at 01:31

Ferdinand.kraft

12,579
10
47
69

Majority vote in R

4 Answers4

Linked