64

I have a data frame in R like this:

  ID   MONTH-YEAR   VALUE
  110   JAN. 2012     1000
  111   JAN. 2012     2000
         .         .
         .         .
  121   FEB. 2012     3000
  131   FEB. 2012     4000
         .           .
         .           .

So, for each month of each year there are n rows and they can be in any order(mean they all are not in continuity and are at breaks). I want to calculate how many rows are there for each MONTH-YEAR i.e. how many rows are there for JAN. 2012, how many for FEB. 2012 and so on. Something like this:

 MONTH-YEAR   NUMBER OF ROWS
 JAN. 2012     10
 FEB. 2012     13
 MAR. 2012     6
 APR. 2012     9

I tried to do this:

n_row <- nrow(dat1_frame %.% group_by(MONTH-YEAR))

but it does not produce the desired output.How can I do that?

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Jason Donnald
  • 2,256
  • 9
  • 36
  • 49
  • try looking at ?aggregate. I think you want `aggregate(VALUE ~ MONTH-YEAR, data = dat1_frame, FUN = function(x){NROW(x)})`. – Andy Clifton Aug 13 '14 at 18:05
  • @AnandaMahto will `table()` give me number of rows for each month of each year? I have modified the above post to show what I want as an output – Jason Donnald Aug 13 '14 at 18:06
  • @AndyClifton I am not concerned about `VALUE` column in my data frame. I just want the total count/number of rows for each month. – Jason Donnald Aug 13 '14 at 18:11
  • @AnandaMahto there 3 columns. There is one column in the beginning which is sort of ID for each row but is not of my concern – Jason Donnald Aug 13 '14 at 18:20
  • I have added the ID column in my above data set example. I hope this what you meant by "reproducible example" – Jason Donnald Aug 13 '14 at 18:22
  • jason, `aggregate()` just needs something to count as function of the different values of `MONTH-YEAR`. In this case, I used `VALUE` as the thing to count. See answer. – Andy Clifton Aug 13 '14 at 20:00

8 Answers8

60

The count() function in plyr does what you want:

library(plyr)

count(mydf, "MONTH-YEAR")
zx8754
  • 52,746
  • 12
  • 114
  • 209
Head
  • 4,691
  • 3
  • 30
  • 18
49

Here's an example that shows how table(.) (or, more closely matching your desired output, data.frame(table(.)) does what it sounds like you are asking for.

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

Here's the (reproducible) sample data:

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
                       MONTH.YEAR = c("JAN. 2012", "JAN. 2012", 
                                      "FEB. 2012", "FEB. 2012", 
                                      "MAR. 2012"), 
                       VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)), 
                  .Names = c("ID", "MONTH.YEAR", "VALUE"), 
                  class = "data.frame", row.names = c(NA, -5L))

mydf
#    ID MONTH.YEAR VALUE
# 1 110  JAN. 2012  1000
# 2 111  JAN. 2012  2000
# 3 121  FEB. 2012  3000
# 4 131  FEB. 2012  4000
# 5 141  MAR. 2012  5000

Here's the calculation of the number of rows per group, in two output display formats:

table(mydf$MONTH.YEAR)
# 
# FEB. 2012 JAN. 2012 MAR. 2012 
#         2         2         1

data.frame(table(mydf$MONTH.YEAR))
#        Var1 Freq
# 1 FEB. 2012    2
# 2 JAN. 2012    2
# 3 MAR. 2012    1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
20

Try using the count function in dplyr:

library(dplyr)
dat1_frame %>% 
    count(MONTH.YEAR)

I am not sure how you got MONTH-YEAR as a variable name. My R version does not allow for such a variable name, so I replaced it with MONTH.YEAR.

As a side note, the mistake in your code was that dat1_frame %.% group_by(MONTH-YEAR) without a summarise function returns the original data frame without any modifications. So, you want to use

dat1_frame %>%
    group_by(MONTH.YEAR) %>%
    summarise(count=n())
Kate
  • 309
  • 2
  • 3
  • You can use variable names containing characters like `-` by quoting the string with backticks. For example, `\`MONTH-YEAR\`` – dmh Jul 09 '19 at 10:40
  • I use this recipe every time when I need to accomplish this kind of task. I find this recipe easy to read-easy to remember. Upvoted. – akash ansari Jul 16 '20 at 13:56
19

Using the example data set that Ananda dummied up, here's an example using aggregate(), which is part of core R. aggregate() just needs something to count as function of the different values of MONTH-YEAR. In this case, I used VALUE as the thing to count:

aggregate(cbind(count = VALUE) ~ MONTH.YEAR, 
          data = mydf, 
          FUN = function(x){NROW(x)})

which gives you..

  MONTH.YEAR count
1  FEB. 2012     2
2  JAN. 2012     2
3  MAR. 2012     1
Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
  • 4
    I think `FUN = function(x){NROW(x)}` can be replaced by just `NROW` or `length`. – David Arenburg Jul 06 '17 at 10:17
  • Yes, correct. But imagine you wanted to use the same approach for a slightly different function, or you wanted to add an naomit() - you need to use the longer form shown here. – Andy Clifton Jul 06 '17 at 10:34
8

Just for completion the data.table solution:

library(data.table)

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
                       MONTH.YEAR = c("JAN. 2012", "JAN. 2012", 
                                      "FEB. 2012", "FEB. 2012", 
                                      "MAR. 2012"), 
                       VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)), 
                  .Names = c("ID", "MONTH.YEAR", "VALUE"), 
                  class = "data.frame", row.names = c(NA, -5L))

setDT(mydf)
mydf[, .(`Number of rows` = .N), by = MONTH.YEAR]

   MONTH.YEAR Number of rows
1:  JAN. 2012              2
2:  FEB. 2012              2
3:  MAR. 2012              1
User2321
  • 2,952
  • 23
  • 46
6
library(plyr)
ddply(data, .(MONTH-YEAR), nrow)

This will give you the answer, if "MONTH-YEAR" is a variable. First, try unique(data$MONTH-YEAR) and see if it returns unique values (no duplicates).

Then above simple split-apply-combine will return what you are looking for.

won782
  • 696
  • 1
  • 4
  • 13
  • That won't work. `data$MONTH-YEAR` will try and subtract a vector called `YEAR` from the `MONTH` column. Either quote the name: `unique(data$"MONTH-YEAR")` or use a dot in the name. I'm pretty sure the `ddply` example will fail too. – Spacedman Dec 15 '14 at 10:58
  • @Spacedman Yeah you are right. Made a bad example of naming the attribute. – won782 Dec 15 '14 at 14:38
1

Here is another way of using aggregate to count rows by group:

my.data <- read.table(text = '
    month.year    my.cov
      Jan.2000     apple
      Jan.2000      pear
      Jan.2000     peach
      Jan.2001     apple
      Jan.2001     peach
      Feb.2002      pear
', header = TRUE, stringsAsFactors = FALSE, na.strings = NA)

rows.per.group  <- aggregate(rep(1, length(my.data$month.year)),
                             by=list(my.data$month.year), sum)
rows.per.group

#    Group.1 x
# 1 Feb.2002 1
# 2 Jan.2000 3
# 3 Jan.2001 2
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
0

Suppose we have a df_data data frame as below

> df_data
   ID MONTH-YEAR VALUE
1 110   JAN.2012  1000
2 111   JAN.2012  2000
3 121   FEB.2012  3000
4 131   FEB.2012  4000
5 141   MAR.2012  5000

To count number of rows in df_data grouped by MONTH-YEAR column, you can use:

> summary(df_data$`MONTH-YEAR`)

FEB.2012 JAN.2012 MAR.2012 
   2        2        1 

enter image description here summary function will create a table from the factor argument, then create a vector for the result (line 7 & 8)

Tung Nguyen
  • 1,486
  • 2
  • 18
  • 13
  • 1
    Please edit with more information. Code-only and "try this" answers are discouraged, because they contain no searchable content, and don't explain why someone should "try this". – abarisone Jul 20 '16 at 11:12