53

I have a data.frame mydf with about 2500 rows. These rows correspond to 69 classes of objects in colum 1 mydf$V1, and I want to count how many rows per object class I have. I can get a factor of these classes with:

objectclasses = unique(factor(mydf$V1, exclude="1"));

What's the terse R way to count the rows per object class? If this were any other language I'd be traversing an array with a loop and keeping count but I'm new to R programming and am trying to take advantage of R's vectorised operations.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Escher
  • 5,418
  • 12
  • 54
  • 101

9 Answers9

65

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>% 
  group_by(ID) %>%
  summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

   ID no_rows
1   a       2
2   b       3
3   c       3
4   d       3
5   e       2
6   f       4
7   g       6
8   h       1
9   i       6
10  j       5
11  k       6
12  l       4
13  m       7
14  n       2
15  o       2
16  p       2
17  q       5
18  r       4
19  s       5
20  t       3
21  u       8
22  v       4
23  w       5
24  x       4
25  y       3
26  z       1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • This is exactly what I wanted. The table answer is also useful; there are a few problems with the data that prevent me using a table for the moment, so I am using a data.frame for the moment. – Escher Sep 30 '14 at 07:30
  • I'm new to R, but it seems this dplyr package is the jquery of R. It's the answer for a LOT of things. – Tim Coker Feb 17 '16 at 14:56
  • 5
    Using `table` instead would be better, as it doesn't require an extra library. – Yan Foto Aug 09 '16 at 07:37
  • @YanFoto I fail to see why limiting yourself to base R is preferable. By this logic you would favor using the base graphics over ggplot2, where ggplot2 is imo a lot better. I can imagine situations where your argument could hold (for example when for some reason you cannot install extra packages, or are stuck with an old version of R that does not support a particular package). However, as a blanket statement I do not agree that base R solutions are better than those using additional packages. – Paul Hiemstra Aug 09 '16 at 08:34
  • 1
    `ggplot2` actually provides an added value over `graphics`, whereas in this case the provided solution does exactly the same as what `table` would do for a factor. My comment refers the problem and the question at hand and *is* not a general statement regarding packages. – Yan Foto Aug 09 '16 at 08:39
  • I think dplyr has merit over table, especially if you consider the wider context of what dplyr can do. – Paul Hiemstra Aug 09 '16 at 08:41
  • 4
    I am on the same page with you on what `deplyr` can do. I think the misunderstanding is coming from my statement. I don't acclaim universality! I meant that as an opinion limited within the context of this question. Given a factor `f`, `table(f)` does the same thing as this solution suggests. – Yan Foto Aug 09 '16 at 08:48
  • @YanFoto I also feel for your argument, but even in this case I would not prefer table over dplyr. For me, using dplyr is default, using base R is the exception. Also note that these kind of comments are not the ideal way to get across a balanced point :). – Paul Hiemstra Aug 09 '16 at 08:51
41

Here 2 ways to do it:

set.seed(1)
tt <- sample(letters,100,rep=TRUE)

## using table
table(tt)
tt
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 
## using tapply
tapply(tt,tt,length)
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 
agstudy
  • 119,832
  • 17
  • 199
  • 261
33

Using plyr package:

library(plyr)

count(mydf$V1)

It will return you a frequency of each value.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Andriy T.
  • 2,020
  • 12
  • 23
25

Using data.table

 library(data.table)
 setDT(dat)[, .N, keyby=ID] #(Using @Paul Hiemstra's `dat`)

Or using dplyr 0.3

 res <- count(dat, ID)
 head(res)
 #Source: local data frame [6 x 2]

 #  ID n
 #1  a 2
 #2  b 3
 #3  c 3
 #4  d 3
 #5  e 2
 #6  f 4

Or

  dat %>% 
      group_by(ID) %>% 
      tally()

Or

  dat %>% 
      group_by(ID) %>%
      summarise(n=n())
Arun
  • 116,683
  • 26
  • 284
  • 387
akrun
  • 874,273
  • 37
  • 540
  • 662
16

We can use summary on factor column:

summary(myDF$factorColumn)
zx8754
  • 52,746
  • 12
  • 114
  • 209
Spariant
  • 161
  • 1
  • 2
6

One more approach would be to apply n() function which is counting the number of observations

library(dplyr)
library(magrittr)
data %>% 
  group_by(columnName) %>%
  summarise(Count = n())
iamigham
  • 117
  • 1
  • 4
3

In case I just want to know how many unique factor levels exist in the data, I use:

length(unique(df$factorcolumn))
Peter
  • 2,120
  • 2
  • 19
  • 33
1

Use the package plyr with lapply to get frequencies for every value (level) and every variable (factor) in your data frame.

library(plyr)
lapply(df, count)
  • this answer likely belongs as a comment. please review how to write a good answer - https://stackoverflow.com/help/how-to-answer – Claire May 10 '18 at 18:45
1

This is an old post, but you can do this with base R and no data frames/data tables:

sapply(levels(yTrain), function(sLevel) sum(yTrain == sLevel))
Victor
  • 11
  • 2