Frequency count of two column in R

Question

I have two columns in data frame

I want to count frequency of both columns and get the result in this format

  y    m Freq
 2010  1 2
 2010  2 2
 2010  3 1
 2011  1 1
 2011  2 1

Not sure why this wasn't offered, but you could also just do `as.data.frame(table(df))` — David Arenburg, Feb 19 '17 at 11:46
The only problem with this solution is it does a true crosstab, which prints a row for the 0s too. This could blow up quickly if there are lots of m values that are not shared by most of the y values. — pyll, Jul 25 '17 at 15:09

score 41 · Accepted Answer · answered Jun 04 '12 at 10:40

41

If your data is dataframe df with columns y and m

library(plyr)
counts <- ddply(df, .(df$y, df$m), nrow)
names(counts) <- c("y", "m", "Freq")

answered Jun 04 '12 at 10:40

danas.zuokas

4,551
4
29
39

Rich Scriven · Answer 2 · 2018-01-31T16:52:20.907

16

I haven't seen a dplyr answer yet. The code is rather simple.

library(dplyr)
rename(count(df, y, m), Freq = n)
# Source: local data frame [5 x 3]
# Groups: V1 [?]
#
#       y     m  Freq
#   (int) (int) (int)
# 1  2010     1     2
# 2  2010     2     2
# 3  2010     3     1
# 4  2011     1     1
# 5  2011     2     1

Data:

df <- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 
2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
), class = "data.frame", row.names = c(NA, -7L))

edited Jan 31 '18 at 16:52

answered Apr 19 '16 at 00:09

Rich Scriven

97,041
11
181
245

2

2022 update: `count(df, y, m, name = "Freq")` – zx8754 May 26 '22 at 11:19

Richard · Answer 3 · 2016-04-18T23:56:41.313

8

A more idiomatic data.table version of @ugh's answer would be:

library(data.table) # load package
df <- data.frame(y = c(rep(2010, 5), rep(2011,2)), m = c(1,1,2,2,3,1,2)) # setup data
dt <- data.table(df) # transpose to data.table
dt[, list(Freq =.N), by=list(y,m)] # use list to name var directly

edited Apr 18 '16 at 23:56

answered May 25 '15 at 13:40

Richard

1,224
3
16
32

It's not about correct or incorrect. The answer is correct but maybe not so idiomatic. You should have probably just propose an edit instead of posting a competitive answer. – David Arenburg Apr 18 '16 at 18:14
Hi @DavidArenburg, thanks for the feedback. I guess, I got overexcited about the .N notation in data.table (although I seem to recall that at the time Ugh's answer didn't seem to work for me). Upon your suggestion I edited ugh's answer and will remove this one, if the edits are accepted. Also changed the 'incorrect' part in my own answer. – Richard Apr 18 '16 at 23:54
1

@DavidArenburg - You should not edit an answer to change to a better alternative ([see code review guidelines](http://meta.stackexchange.com/questions/155538/what-are-the-guidelines-for-reviewing/155539#155539)). Richard, creating your own answer was the correct thing to do ;P. – Scopey Apr 19 '16 at 01:22
@Scopey, well it is also written there that it helps to know the language. In this case, the answer did work and wasn't wrong (as stated by the OP) and the only change was to copy/paste the answer and switch one working function by another which does exactly the same. This borders both rudeness and plagiarism. By your logic, we would have 30-40 answers on every question with exactly the same code with small changes in functions that are doing exactly the same. Maybe I should also post an exactly the same answer using `setDT` instead of `data.table` because it's more idiomatic? – David Arenburg Apr 19 '16 at 08:20
The answer is - No, I shouldn't. Because the thread will become a mess with tons of identical answers. Either way, I haven't seen the suggested edit, but what I meant was is to suggests a small alternative in the code instead of replacing the original answer. Eitherway, when you are not familiar with the language it is better to skip most of the times. This reminds me of [this](http://meta.stackoverflow.com/questions/283082/reviewing-process-for-tags-youre-unfamiliar-with) and [this](http://meta.stackoverflow.com/questions/319486/can-i-refactor-current-code-answers-and-post-it-as-new-answers/) – David Arenburg Apr 19 '16 at 08:24
Btw, @Richard, if we already at it. `data.table` creates a copy. You could just do `setDT(df)[, .(Freq =.N), by=.(y,m)]` instead (no need to create `dt` at all). Or you could create `dt` directly using `dt <- data.table(y = c(rep(2010, 5), rep(2011,2)), m = c(1,1,2,2,3,1,2))` without creating `df` in the first place. – David Arenburg Apr 19 '16 at 08:26
1

@DavidArenburg thanks for these helpful pointers for a more compressed answer. I purposively created the dataframe, so the OP would now how to do that next he asks a question. Also, in my workflow I often create copies, so that if I get stuck at one point, I can simple continue from the points where I left off, without having the re-read the entire R file. – Richard Apr 22 '16 at 11:24

score 5 · Answer 4 · edited Jun 15 '15 at 10:35

5

Using sqldf:

sqldf("SELECT y, m, COUNT(*) as Freq
       FROM table1
       GROUP BY y, m")

edited Jun 15 '15 at 10:35

zx8754

52,746
12
114
209

answered Jun 04 '12 at 10:11

Kshitij

8,474
2
26
34

score 4 · Answer 5 · answered Jun 04 '12 at 14:23

If you had a very big data frame with many columns or didn't know the column names in advance, something like this might be useful:

library(reshape2)
df_counts <- melt(table(df))
names(df_counts) <- names(df)
colnames(df_counts)[ncol(df_counts)] <- "count"
df_counts    

  y    m     count
1 2010 1     2
2 2011 1     1
3 2010 2     2
4 2011 2     1
5 2010 3     1
6 2011 3     0

score 4 · Answer 6 · answered Mar 05 '19 at 15:53

Here is a simple base R solution using table() and as.data.frame()

df2 <- as.data.frame(table(df1))
# df2 
     y m Freq
1 2010 1    2
2 2011 1    1
3 2010 2    2
4 2011 2    1
5 2010 3    1
6 2011 3    0

df2[df2$Freq != 0, ]
# output
     y m Freq
1 2010 1    2
2 2011 1    1
3 2010 2    2
4 2011 2    1
5 2010 3    1

Data

df1 <- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 
                           2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
                           ), class = "data.frame", row.names = c(NA, -7L))

score 3 · Answer 7 · edited Jan 04 '13 at 23:15

3

library(data.table)

oldformat <- data.table(oldformat)  ## your orignal data frame
newformat <- oldformat[,list(Freq=length(m)), by=list(y,m)]

edited Jan 04 '13 at 23:15

Julius Vainora

47,421
9
90
102

answered Jan 04 '13 at 23:12

ugh

769
2
6
6

score -1 · Answer 8 · answered Mar 21 '22 at 10:52

Here another approach that I found here:

df<- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 
                           2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
                           ), class = "data.frame", row.names = c(NA, -7L))

Two options:

aggregate(cbind(count = y) ~ m, 
          data = df, 
          FUN = function(x){NROW(x)})

or

aggregate(cbind(count = y) ~ m, 
          data = df, 
          FUN = length)

Frequency count of two column in R

8 Answers8

Linked

Related