R equivalent of SQL SELECT COUNT(*) ... GROUP BY

Question

I'm trying to find how to count the number of integers of each type in a vector. Eg, how many 1, 2, and 3 are there (without hard-coding == 1,2,3):

test_vec = c(1,2,3,1,2,1,2,1,1,1,2,1,3,1,2,3,2,1,2,1,3)

And, how to identify that I added some 4s to the vector and count them?

test_vec = c(test_vec,4,4,4)

I could do this with range() and a loop, but wondered if there is a general vectorised solution?

Edit: not the same question as this because that question doesn't ask about a generalised table situation (though the answers sensibly suggests it) but rather checking hard-coded equality sum(test_vec==x)

If you look at data through SQL-spectacles, `data.table` might be useful for you, not only for this purpose (as described by Colonel Beauvel) https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-intro-vignette.html — rmuc8, Apr 02 '15 at 13:33
possible duplicate of [Counting the Number of Elements With The Values of x in a Vector?](http://stackoverflow.com/questions/1923273/counting-the-number-of-elements-with-the-values-of-x-in-a-vector) — rmuc8, Apr 02 '15 at 14:05
No, see edit (and though the logical leap is not hard to make, the hard-coded context is why I didn't find that question when searching, and the best way I knew how to express what I wanted was in SQL). — Escher, Apr 02 '15 at 14:38

score 5 · Accepted Answer · answered Apr 02 '15 at 14:10

5

aggregate is very handy in this situation

> aggregate(data.frame(count = test_vec), list(value = test_vec), length)

  value count
1     1    10
2     2     7
3     3     4

answered Apr 02 '15 at 14:10

SeaSprite

564
8
12

1

That will be *very* useful for me, as the larger context of that little problem involves a lot of data.frames extracted from a database. Thanks. – Escher Apr 02 '15 at 14:39
I ended up accepting this (even though table is the obvious simplest answer) because I assume most people working with databases will have data.frames and this helped me the most. – Escher Apr 02 '15 at 15:12
I think this anwer is rather confusing, as you operate on a vector in your example. Why would you make it so complicated and convert the vector to a data.frame? I'm certainly not an expert here but why don't you write `aggregate(test_vec, length, by=list(test_vec))` – rmuc8 Apr 02 '15 at 15:40
@rmuc8 Result-wise it is the same. The extra steps were simply assigning column names to make result readable and ease the downstream process. – SeaSprite Apr 02 '15 at 15:57
@muc8 I selected it as the most useful answer in the context of the question (data comes directly out of databases into data.frames via `RMySQL`). The example I chose was a vector for simplicity, but in reality anyone who is searching to group by count in R *instead of* in their database will probably find this the most *useful* in their circumstances. I'm also grateful for the other respondents who show how to think about the problem in more diverse, "native" R ways. – Escher Apr 02 '15 at 16:20
Please notice that the first parameter of aggregate (x) is only used to compute the aggregation function on its subsets. If the function you are calling is length you don't care of contents of those vectors but just of their length. So you could just pass a vector of the same length with dummy values. Eg. `aggregate(rep(1,length(test_vec)), list(value = test_vec), length)`. – Luke Jun 24 '20 at 14:58

score 4 · Answer 2 · answered Apr 02 '15 at 13:17

4

you can use table

table(test_vec)
test_vec
 1  2  3 
10  7  4

answered Apr 02 '15 at 13:17

Mamoun Benghezal

5,264
7
28
33

score 2 · Answer 3 · edited Apr 02 '15 at 16:51

2

You can use data.table package as well to count the number of elements in each group.

library(data.table)
as.data.table(x = test_vec)[, .N, by=x]
#   x  N
#1: 1 10
#2: 2  7
#3: 3  4
#4: 4  3

.N is a special in-built variable and is a length-1 integer. It holds the number of observations in each group.

edited Apr 02 '15 at 16:51

Arun

116,683
26
284
387

answered Apr 02 '15 at 13:27

Colonel Beauvel

30,423
11
47
87

JasonAizkalns · Answer 4 · 2015-04-02T14:10:29.700

2

The dplyr approach:

test_vec = c(1,2,3,1,2,1,2,1,1,1,2,1,3,1,2,3,2,1,2,1,3)
library(dplyr)
df <- data_frame(test_vec)

df %>% 
    count(test_vec)

# Alternative that shows group_by
df %>%
    group_by(test_vec) %>%
    summarise(n = n()) # or tally()

#   test_vec  n
# 1        1 10
# 2        2  7
# 3        3  4

edited Apr 02 '15 at 14:10

answered Apr 02 '15 at 13:38

JasonAizkalns

20,243
8
57
116

1

I made an edit; feel free to roll back if you don't like the `count` use but it's less typing. – Tyler Rinker Apr 02 '15 at 14:09
1

@TylerRinker fair enough - edited to show both only because the question was asking specifically for equivalents to SQL's Group By clause. – JasonAizkalns Apr 02 '15 at 14:11

rmuc8 · Answer 5 · 2015-04-02T15:34:52.650

1

to the second part of your question

> which(test_vec == 4)

[1] 22 23 24  # gives you their position in the vector in order to "identify" them

> sum(test_vec == 4) 

[1] 3 # counts the 4's in the vector

edit: as we mention everything here,

tapply(test_vec, test_vec, length)

would also work

 1  2  3 
10  7  4

edited Apr 02 '15 at 15:34

answered Apr 02 '15 at 13:25

rmuc8

2,869
7
27
36

R equivalent of SQL SELECT COUNT(*) ... GROUP BY

5 Answers5