counting unique factors in r

Question

I would like to know the number of unique dams which gave birth on each of the birth dates recorded. My data frame is similar to this one:

dam <- c("2A11","2A11","2A12","2A12","2A12","4D23","4D23","1X23")
bdate <- c("2009-10-01","2009-10-01","2009-10-01","2009-10-01",
           "2009-10-01","2009-10-03","2009-10-03","2009-10-03")
mydf <- data.frame(dam,bdate)
mydf
#    dam      bdate
# 1 2A11 2009-10-01
# 2 2A11 2009-10-01
# 3 2A12 2009-10-01
# 4 2A12 2009-10-01
# 5 2A12 2009-10-01
# 6 4D23 2009-10-03
# 7 4D23 2009-10-03
# 8 1X23 2009-10-03

I used aggregate(dam ~ bdate, data=mydf, FUN=length) but it counts all the dams that gave birth on a particular date

bdate dam
1 2009-10-01   5
2 2009-10-03   3

Instead, I need to have something like this:

mydf2
  bdate      dam
1 2009-10-01  2
2 2009-10-03  2

Your help is very much appreciated!

score 16 · Accepted Answer · answered May 05 '11 at 02:16

16

What about:

aggregate(dam ~ bdate, data=mydf, FUN=function(x) length(unique(x)))

answered May 05 '11 at 02:16

Joshua Ulrich

173,410
32
338
418

Aaron left Stack Overflow · Answer 2 · 2011-05-05T12:39:55.950

5

You could also run unique on the data first:

aggregate(dam ~ bdate, data=unique(mydf[c("dam","date")]), FUN=length)

Then you could also use table instead of aggregate, though the output is a little different.

> table(unique(mydf[c("dam","date")])$bdate)

2009-10-01 2009-10-03 
         2          2

edited May 05 '11 at 12:39

answered May 05 '11 at 03:21

Aaron left Stack Overflow

36,704
7
77
142

2

+1 Nice idea to run `unique` first. Do note, however, that this will only work if `mydf` only contains `dam` and `bdate`. – Joshua Ulrich May 05 '11 at 03:50
@Joshua: that's exactly correct. I tried to run on my data and it could not get what I wanted. The line you provided did exactly what I was looking for as my data contains about 60 other variables. – baz May 05 '11 at 06:57
If you do have other variables, then just use the two columns you want. See edit. – Aaron left Stack Overflow May 05 '11 at 12:39

Preston · Answer 3 · 2020-10-08T16:04:48.667

3

In dplyr you can use n_distinct :

library(tidyverse)
mydf %>%
  group_by(bdate) %>%
  summarize(dam = n_distinct(dam))

edited Oct 08 '20 at 16:04

answered Dec 08 '17 at 14:18

Preston

7,399
8
54
84

score 3 · Answer 4 · answered May 05 '11 at 07:01

This is just an example of how to think of the problem and one of the approaches on how to solve it.

split.mydf <- with(mydf, split(x = mydf, f = bdate)) #each list element has only one date.
# it's just a matter of counting unique dams
unique.mydf <- lapply(X = split.mydf, FUN = unique)
#and then count the number of unique elements
unilen.mydf <- lapply(unique.mydf, length)
#you can do these two last steps in one go like so
lapply(split.mydf, FUN = function(x) length(unique(x)))

as.data.frame(unlist(unilen.mydf)) #data.frame is just a special list, so this is water to your mill

           unlist(unilen.mydf)
2009-10-01                   2
2009-10-03                   2

Nice example: especially useful for those with a slightly different problem who find this thread as it separates out the steps for easier understanding. — Aaron left Stack Overflow, May 05 '11 at 13:57

counting unique factors in r

4 Answers4

Linked