33

I have two columns in data frame

2010  1
2010  1
2010  2
2010  2
2010  3
2011  1
2011  2

I want to count frequency of both columns and get the result in this format

  y    m Freq
 2010  1 2
 2010  2 2
 2010  3 1
 2011  1 1
 2011  2 1 
Sunny Sunny
  • 3,204
  • 4
  • 25
  • 26
  • 4
    Not sure why this wasn't offered, but you could also just do `as.data.frame(table(df))` – David Arenburg Feb 19 '17 at 11:46
  • 1
    The only problem with this solution is it does a true crosstab, which prints a row for the 0s too. This could blow up quickly if there are lots of m values that are not shared by most of the y values. – pyll Jul 25 '17 at 15:09

8 Answers8

41

If your data is dataframe df with columns y and m

library(plyr)
counts <- ddply(df, .(df$y, df$m), nrow)
names(counts) <- c("y", "m", "Freq")
danas.zuokas
  • 4,551
  • 4
  • 29
  • 39
16

I haven't seen a dplyr answer yet. The code is rather simple.

library(dplyr)
rename(count(df, y, m), Freq = n)
# Source: local data frame [5 x 3]
# Groups: V1 [?]
#
#       y     m  Freq
#   (int) (int) (int)
# 1  2010     1     2
# 2  2010     2     2
# 3  2010     3     1
# 4  2011     1     1
# 5  2011     2     1

Data:

df <- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 
2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
), class = "data.frame", row.names = c(NA, -7L))
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
8

A more idiomatic data.table version of @ugh's answer would be:

library(data.table) # load package
df <- data.frame(y = c(rep(2010, 5), rep(2011,2)), m = c(1,1,2,2,3,1,2)) # setup data
dt <- data.table(df) # transpose to data.table
dt[, list(Freq =.N), by=list(y,m)] # use list to name var directly
Richard
  • 1,224
  • 3
  • 16
  • 32
  • It's not about correct or incorrect. The answer is correct but maybe not so idiomatic. You should have probably just propose an edit instead of posting a competitive answer. – David Arenburg Apr 18 '16 at 18:14
  • Hi @DavidArenburg, thanks for the feedback. I guess, I got overexcited about the .N notation in data.table (although I seem to recall that at the time Ugh's answer didn't seem to work for me). Upon your suggestion I edited ugh's answer and will remove this one, if the edits are accepted. Also changed the 'incorrect' part in my own answer. – Richard Apr 18 '16 at 23:54
  • 1
    @DavidArenburg - You should not edit an answer to change to a better alternative ([see code review guidelines](http://meta.stackexchange.com/questions/155538/what-are-the-guidelines-for-reviewing/155539#155539)). Richard, creating your own answer was the correct thing to do ;P. – Scopey Apr 19 '16 at 01:22
  • @Scopey, well it is also written there that it helps to know the language. In this case, the answer did work and wasn't wrong (as stated by the OP) and the only change was to copy/paste the answer and switch one working function by another which does exactly the same. This borders both rudeness and plagiarism. By your logic, we would have 30-40 answers on every question with exactly the same code with small changes in functions that are doing exactly the same. Maybe I should also post an exactly the same answer using `setDT` instead of `data.table` because it's more idiomatic? – David Arenburg Apr 19 '16 at 08:20
  • The answer is - No, I shouldn't. Because the thread will become a mess with tons of identical answers. Either way, I haven't seen the suggested edit, but what I meant was is to suggests a small alternative in the code instead of replacing the original answer. Eitherway, when you are not familiar with the language it is better to skip most of the times. This reminds me of [this](http://meta.stackoverflow.com/questions/283082/reviewing-process-for-tags-youre-unfamiliar-with) and [this](http://meta.stackoverflow.com/questions/319486/can-i-refactor-current-code-answers-and-post-it-as-new-answers/) – David Arenburg Apr 19 '16 at 08:24
  • Btw, @Richard, if we already at it. `data.table` creates a copy. You could just do `setDT(df)[, .(Freq =.N), by=.(y,m)]` instead (no need to create `dt` at all). Or you could create `dt` directly using `dt <- data.table(y = c(rep(2010, 5), rep(2011,2)), m = c(1,1,2,2,3,1,2))` without creating `df` in the first place. – David Arenburg Apr 19 '16 at 08:26
  • 1
    @DavidArenburg thanks for these helpful pointers for a more compressed answer. I purposively created the dataframe, so the OP would now how to do that next he asks a question. Also, in my workflow I often create copies, so that if I get stuck at one point, I can simple continue from the points where I left off, without having the re-read the entire R file. – Richard Apr 22 '16 at 11:24
5

Using sqldf:

sqldf("SELECT y, m, COUNT(*) as Freq
       FROM table1
       GROUP BY y, m")
zx8754
  • 52,746
  • 12
  • 114
  • 209
Kshitij
  • 8,474
  • 2
  • 26
  • 34
4

If you had a very big data frame with many columns or didn't know the column names in advance, something like this might be useful:

library(reshape2)
df_counts <- melt(table(df))
names(df_counts) <- names(df)
colnames(df_counts)[ncol(df_counts)] <- "count"
df_counts    

  y    m     count
1 2010 1     2
2 2011 1     1
3 2010 2     2
4 2011 2     1
5 2010 3     1
6 2011 3     0
Ben
  • 41,615
  • 18
  • 132
  • 227
4

Here is a simple base R solution using table() and as.data.frame()

df2 <- as.data.frame(table(df1))
# df2 
     y m Freq
1 2010 1    2
2 2011 1    1
3 2010 2    2
4 2011 2    1
5 2010 3    1
6 2011 3    0

df2[df2$Freq != 0, ]
# output
     y m Freq
1 2010 1    2
2 2011 1    1
3 2010 2    2
4 2011 2    1
5 2010 3    1

Data

df1 <- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 
                           2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
                           ), class = "data.frame", row.names = c(NA, -7L))
nghauran
  • 6,648
  • 2
  • 20
  • 29
3
library(data.table)

oldformat <- data.table(oldformat)  ## your orignal data frame
newformat <- oldformat[,list(Freq=length(m)), by=list(y,m)]
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
ugh
  • 769
  • 2
  • 6
  • 6
-1

Here another approach that I found here:

df<- structure(list(y = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 
                           2011L), m = c(1L, 1L, 2L, 2L, 3L, 1L, 2L)), .Names = c("y", "m"
                           ), class = "data.frame", row.names = c(NA, -7L))

Two options:

aggregate(cbind(count = y) ~ m, 
          data = df, 
          FUN = function(x){NROW(x)})

or

aggregate(cbind(count = y) ~ m, 
          data = df, 
          FUN = length)
emr2
  • 1,436
  • 7
  • 23