0

My question is related to this one from 2013 R: Count unique values by category

using the following data in R:

    set.seed(1)
mydf <- data.frame(
  Cnty = rep(c("185", "31", "189"), times = c(5, 3, 2)),
  Yr = c(rep(c("1999", "2000"), times = c(3, 2)), 
         "1999", "1999", "2000", "2000", "2000"),
  Plt = "20001",
  Spp = sample(c("Bitternut", "Pignut", "WO"), 10, replace = TRUE),
  DBH = runif(10, 0, 15)
)

mydf
#    Cnty   Yr   Plt       Spp       DBH
# 1   185 1999 20001 Bitternut  3.089619
# 2   185 1999 20001    Pignut  2.648351
# 3   185 1999 20001    Pignut 10.305343
# 4   185 2000 20001        WO  5.761556
# 5   185 2000 20001 Bitternut 11.547621
# 6    31 1999 20001        WO  7.465489
# 7    31 1999 20001        WO 10.764278
# 8    31 2000 20001    Pignut 14.878591
# 9   189 2000 20001    Pignut  5.700528
# 10  189 2000 20001 Bitternut 11.661678

What i'd like to be able to do and what was not done by the previous asker or answerers is:

Count how many counties each species exists in, which is very simply done with a table function

However, in my data there are over a million rows five different species and I don't know how many counties (a very large number anyway)

How could I get a table that gives me an answer of:

Species count_of_Counties
bitternut 2
pignut 3
WO 2

instead of the following answer:

        Cnty
# Spp         185 189 31
#   Bitternut   2   1  0
#   Pignut      2   1  1
#   WO          1   0  2

If I attempt this solution I will have well over 400,000 columns

Community
  • 1
  • 1
k BORT
  • 73
  • 8
  • You could try data.table `myDT[, .N, by=c("Spp", "Cnty")]` which gives you counts by species by country quite easily. This solution scales to millions of records. – Sun Bee Sep 15 '16 at 07:18

1 Answers1

0

How about this?

library(dplyr)
mydf %>% 
    group_by(Spp) %>% 
    summarize(n=n())

 Spp count_of_Counties
1 Bitternut                 3
2    Pignut                 4
3        WO                 3

mydf %>% 
   group_by(Spp, Cnty) %>% 
   summarize(n=n()) %>% 
   group_by(Spp) %>% 
   summarize(count_of_Counties=n())


 Spp count_of_Counties
1 Bitternut                 2
2    Pignut                 3
3        WO                 2
Sotos
  • 51,121
  • 6
  • 32
  • 66
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63