How to count how many values per level in a given factor?

Question

I have a data.frame mydf with about 2500 rows. These rows correspond to 69 classes of objects in colum 1 mydf$V1, and I want to count how many rows per object class I have. I can get a factor of these classes with:

objectclasses = unique(factor(mydf$V1, exclude="1"));

What's the terse R way to count the rows per object class? If this were any other language I'd be traversing an array with a loop and keeping count but I'm new to R programming and am trying to take advantage of R's vectorised operations.

Have you tried a `table`? Without a reproducible example it's hard to guess the answer — Rich Scriven, Sep 30 '14 at 06:55
Possible duplicate of http://stackoverflow.com/questions/4215154/count-unique-values-in-r/4215196#4215196 — Henrik, Sep 30 '14 at 07:25

score 65 · Accepted Answer · answered Sep 30 '14 at 07:13

65

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>% 
  group_by(ID) %>%
  summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

   ID no_rows
1   a       2
2   b       3
3   c       3
4   d       3
5   e       2
6   f       4
7   g       6
8   h       1
9   i       6
10  j       5
11  k       6
12  l       4
13  m       7
14  n       2
15  o       2
16  p       2
17  q       5
18  r       4
19  s       5
20  t       3
21  u       8
22  v       4
23  w       5
24  x       4
25  y       3
26  z       1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

answered Sep 30 '14 at 07:13

Paul Hiemstra

59,984
12
142
149

This is exactly what I wanted. The table answer is also useful; there are a few problems with the data that prevent me using a table for the moment, so I am using a data.frame for the moment. – Escher Sep 30 '14 at 07:30
I'm new to R, but it seems this dplyr package is the jquery of R. It's the answer for a LOT of things. – Tim Coker Feb 17 '16 at 14:56
5

Using `table` instead would be better, as it doesn't require an extra library. – Yan Foto Aug 09 '16 at 07:37
@YanFoto I fail to see why limiting yourself to base R is preferable. By this logic you would favor using the base graphics over ggplot2, where ggplot2 is imo a lot better. I can imagine situations where your argument could hold (for example when for some reason you cannot install extra packages, or are stuck with an old version of R that does not support a particular package). However, as a blanket statement I do not agree that base R solutions are better than those using additional packages. – Paul Hiemstra Aug 09 '16 at 08:34
1

`ggplot2` actually provides an added value over `graphics`, whereas in this case the provided solution does exactly the same as what `table` would do for a factor. My comment refers the problem and the question at hand and *is* not a general statement regarding packages. – Yan Foto Aug 09 '16 at 08:39
I think dplyr has merit over table, especially if you consider the wider context of what dplyr can do. – Paul Hiemstra Aug 09 '16 at 08:41
4

I am on the same page with you on what `deplyr` can do. I think the misunderstanding is coming from my statement. I don't acclaim universality! I meant that as an opinion limited within the context of this question. Given a factor `f`, `table(f)` does the same thing as this solution suggests. – Yan Foto Aug 09 '16 at 08:48
@YanFoto I also feel for your argument, but even in this case I would not prefer table over dplyr. For me, using dplyr is default, using base R is the exception. Also note that these kind of comments are not the ideal way to get across a balanced point :). – Paul Hiemstra Aug 09 '16 at 08:51

score 41 · Answer 2 · answered Sep 30 '14 at 07:02

Here 2 ways to do it:

set.seed(1)
tt <- sample(letters,100,rep=TRUE)

## using table
table(tt)
tt
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 
## using tapply
tapply(tt,tt,length)
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1

score 33 · Answer 3 · edited Jan 04 '18 at 08:10

33

Using plyr package:

library(plyr)

count(mydf$V1)

It will return you a frequency of each value.

edited Jan 04 '18 at 08:10

zx8754

52,746
12
114
209

answered Sep 30 '14 at 07:30

Andriy T.

2,020
12
23

1

This is the easiest method I can see here, and it works. Thanks! – kabammi Jun 24 '19 at 01:04
Nice! it returns a list. – Xopi García Dec 27 '22 at 11:26

score 25 · Answer 4 · edited Sep 30 '14 at 10:50

25

Using data.table

 library(data.table)
 setDT(dat)[, .N, keyby=ID] #(Using @Paul Hiemstra's `dat`)

Or using dplyr 0.3

 res <- count(dat, ID)
 head(res)
 #Source: local data frame [6 x 2]

 #  ID n
 #1  a 2
 #2  b 3
 #3  c 3
 #4  d 3
 #5  e 2
 #6  f 4

Or

  dat %>% 
      group_by(ID) %>% 
      tally()

Or

  dat %>% 
      group_by(ID) %>%
      summarise(n=n())

edited Sep 30 '14 at 10:50

Arun

116,683
26
284
387

answered Sep 30 '14 at 09:35

akrun

874,273
37
540
662

score 16 · Answer 5 · edited Jan 04 '18 at 08:29

16

We can use summary on factor column:

summary(myDF$factorColumn)

edited Jan 04 '18 at 08:29

zx8754

52,746
12
114
209

answered Jan 03 '18 at 23:37

Spariant

161
1
2

`summary(ggplot2::diamonds$clarity)` looks like it performed as desired. – woodvi Jan 04 '18 at 00:01
1

This should be accepted as a solution, it is done via one built-in function and outputs exactly what is needed. – Matěj Groman Mar 28 '21 at 15:57

score 6 · Answer 6 · answered May 19 '18 at 08:52

6

One more approach would be to apply n() function which is counting the number of observations

library(dplyr)
library(magrittr)
data %>% 
  group_by(columnName) %>%
  summarise(Count = n())

answered May 19 '18 at 08:52

iamigham

117
1
4

score 3 · Answer 7 · answered May 11 '19 at 14:14

3

In case I just want to know how many unique factor levels exist in the data, I use:

length(unique(df$factorcolumn))

answered May 11 '19 at 14:14

Peter

2,120
2
19
33

this doesn't yield the number of values per level – Ben Apr 29 '22 at 07:28

score 1 · Answer 8 · answered May 10 '18 at 18:26

1

Use the package plyr with lapply to get frequencies for every value (level) and every variable (factor) in your data frame.

library(plyr)
lapply(df, count)

answered May 10 '18 at 18:26

Christian Savemark

21
2

this answer likely belongs as a comment. please review how to write a good answer - https://stackoverflow.com/help/how-to-answer – Claire May 10 '18 at 18:45

score 1 · Answer 9 · answered Feb 20 '22 at 14:41

1

This is an old post, but you can do this with base R and no data frames/data tables:

sapply(levels(yTrain), function(sLevel) sum(yTrain == sLevel))

answered Feb 20 '22 at 14:41

Victor

11
2

How to count how many values per level in a given factor?

9 Answers9

Linked

Related