R count values per type

Question

My question is related to this one from 2013 R: Count unique values by category

using the following data in R:

    set.seed(1)
mydf <- data.frame(
  Cnty = rep(c("185", "31", "189"), times = c(5, 3, 2)),
  Yr = c(rep(c("1999", "2000"), times = c(3, 2)), 
         "1999", "1999", "2000", "2000", "2000"),
  Plt = "20001",
  Spp = sample(c("Bitternut", "Pignut", "WO"), 10, replace = TRUE),
  DBH = runif(10, 0, 15)
)

mydf
#    Cnty   Yr   Plt       Spp       DBH
# 1   185 1999 20001 Bitternut  3.089619
# 2   185 1999 20001    Pignut  2.648351
# 3   185 1999 20001    Pignut 10.305343
# 4   185 2000 20001        WO  5.761556
# 5   185 2000 20001 Bitternut 11.547621
# 6    31 1999 20001        WO  7.465489
# 7    31 1999 20001        WO 10.764278
# 8    31 2000 20001    Pignut 14.878591
# 9   189 2000 20001    Pignut  5.700528
# 10  189 2000 20001 Bitternut 11.661678

What i'd like to be able to do and what was not done by the previous asker or answerers is:

Count how many counties each species exists in, which is very simply done with a table function

However, in my data there are over a million rows five different species and I don't know how many counties (a very large number anyway)

How could I get a table that gives me an answer of:

Species count_of_Counties
bitternut 2
pignut 3
WO 2

instead of the following answer:

        Cnty
# Spp         185 189 31
#   Bitternut   2   1  0
#   Pignut      2   1  1
#   WO          1   0  2

If I attempt this solution I will have well over 400,000 columns

You could try data.table `myDT[, .N, by=c("Spp", "Cnty")]` which gives you counts by species by country quite easily. This solution scales to millions of records. — Sun Bee, Sep 15 '16 at 07:18

score 0 · Answer 1 · edited Sep 15 '16 at 07:03

0

How about this?

library(dplyr)
mydf %>% 
    group_by(Spp) %>% 
    summarize(n=n())

 Spp count_of_Counties
1 Bitternut                 3
2    Pignut                 4
3        WO                 3

mydf %>% 
   group_by(Spp, Cnty) %>% 
   summarize(n=n()) %>% 
   group_by(Spp) %>% 
   summarize(count_of_Counties=n())


 Spp count_of_Counties
1 Bitternut                 2
2    Pignut                 3
3        WO                 2

edited Sep 15 '16 at 07:03

Sotos

51,121
6
32
66

answered Sep 15 '16 at 06:22

Sandipan Dey

21,482
2
51
63

No, this code is just counting the number of times you see each species - for example pignut in your answer is 4 times, but pignut is actually only in 3 counties (185, 31, 189) – k BORT Sep 15 '16 at 06:32
I know there can be better solution, but this is a quick hack, checkout the updated code. – Sandipan Dey Sep 15 '16 at 06:45
Ah cool, will this code work if this dataset had much more rows and counties? but still just 3 species – k BORT Sep 15 '16 at 06:47
it should work. – Sandipan Dey Sep 15 '16 at 06:53
can confirm, I expanded the dataset and tried the code again, got the correct answer. Thank you very much. – k BORT Sep 15 '16 at 06:55

R count values per type

1 Answers1