-1

Let's say that I have a simple data frame in R, as follows:

#example data frame
a = c("red","red","green")
b = c("01/01/1900","01/02/1950","01/05/1990")
df = data.frame(a,b)
colnames(df)<-c("Color","Dates")

My goal is to count the number of dates (as a class - not individually) for each variable in the "Color" column. So, the result would look like this:

#output should look like this:
a = c("red","green")
b = c("2","1")
df = data.frame(a,b)
colnames(df)<-c("Color","Dates")

Red was associated with two dates -- the dates themselves are unimportant, I'd just like to count the aggregate number of dates per color in the data frame.

knaslund
  • 33
  • 4

3 Answers3

2

Or in base R:

sapply(split(df, df$Color), nrow)
# green   red 
#     1     2 
Ege Rubak
  • 4,347
  • 1
  • 10
  • 18
  • I like this one best. – Mike Wise Jan 06 '17 at 16:55
  • This is great. Thank you. A complication, however - let's say there is an NA in red, like this: `a=c("red","red","red","green")` `b=c("01/01/1900","01/02/1950","NA","01/05/1990")` `df=data.frame(a,b)` `colnames(df)<-c("Color","Dates")` ...could we not count the NA somehow? – knaslund Jan 06 '17 at 19:02
  • You could just start by omitting `NA` values: `df <- omit.na(df)` and then continue as before. It just occurred to me that you can simply use `table(df$Color)` to get what you want if you are really just counting the number of times each color occurs in the table (after removing `NA` values). – Ege Rubak Jan 07 '17 at 17:10
1

We can use data.table

library(data.table)
setDT(df)[, .(Dates = uniqueN(Dates)) , Color]
#   Color Dates
#1:   red     2
#2: green     1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This would work, but what if the dates are not unique? So, in red for example, both dates are "01/01/1900" ? – knaslund Jan 06 '17 at 16:40
  • @knaslund It will be 1 using this answer. What is your expected for that case? Do you need `setDT(df)[, .(Dates = .N), Color]` – akrun Jan 06 '17 at 16:40
  • ah, yes this seems like it will work fabulously! thank you! – knaslund Jan 06 '17 at 16:46
0

using the dplyr package from the tidyverse:

library(dplyr)
df %>% group_by(Color) %>% summarise(n())
# # A tibble: 2 × 2
#    Color `n()`
#   <fctr> <int>
# 1  green     1
# 2    red     2
Mike Wise
  • 22,131
  • 8
  • 81
  • 104