0

So I have a data frame with occurences of specimens of fish belonging to several species (there are occurences of several specimens belonging to the same species). The data frame has a column with the name of the species of a certain specimen, as well as a grade from A to E that was previously assigned to every specimen belonging to that species (that's why both the specimens of the species Tilapia zilli have the grade C, the grade is assigned to a whole species and not to a specimen individually).

What I want is basically to count for each grade (from A to E), how many species have been assigned to it. Not how many specimens (which are the occurence in this data frame), but the species. And in particular, I would prefer to return the number of species for each grade at a time. For example, a line of code to get the number of species with grade A and another to get the number of species with grade B, and so on...

     species        |    Grade      | 
-----------------------------------
Tilapia guineensis  | B |
Tilapia zillii      | C |
Fundulus rubrifrons | A |
Eutrigla gurnardus  | D |
Sprattus sprattus   | A |
Gadus morhua        | E |
Tilapia zillii      | C |
Gadus morhua        | B | 

I tried this but it didn't work:

length(unique(df$species[df$grade=="A",]))
pppery
  • 3,731
  • 22
  • 33
  • 46
tadeufontes
  • 443
  • 1
  • 3
  • 12
  • ok thanks for the answer, but how do I return just a numeric value for the number of species assigned to just one grade at a time? first get the number of species with grade A for example, and then another number for the number of species with grade B, separately from the first – tadeufontes Sep 13 '19 at 22:25

2 Answers2

1

The dplyr way would be

library(dplyr)

df %>%
  group_by(species, grade) %>%
  summarise(count = n())

#> # A tibble: 7 x 3
#> # Groups:   species [6]
#>   species             grade count
#>   <chr>               <chr> <int>
#> 1 Eutrigla gurnardus  D         1
#> 2 Fundulus rubrifrons A         1
#> 3 Gadus morhua        B         1
#> 4 Gadus morhua        E         1
#> 5 Sprattus sprattus   A         1
#> 6 Tilapia guineensis  B         1
#> 7 Tilapia zillii      C         2

The base way (which also gives you the 0 counts) is to create a contingency table

as.data.frame(table(df))

#>                species grade Freq
#> 1   Eutrigla gurnardus     A    0
#> 2  Fundulus rubrifrons     A    1
#> 3         Gadus morhua     A    0
#> 4    Sprattus sprattus     A    1
#> 5   Tilapia guineensis     A    0
#> 6       Tilapia zillii     A    0
#> 7   Eutrigla gurnardus     B    0
#> 8  Fundulus rubrifrons     B    0
#> 9         Gadus morhua     B    1
#> 10   Sprattus sprattus     B    0
#> 11  Tilapia guineensis     B    1
#> 12      Tilapia zillii     B    0
#> 13  Eutrigla gurnardus     C    0
#> 14 Fundulus rubrifrons     C    0
#> 15        Gadus morhua     C    0
#> 16   Sprattus sprattus     C    0
#> 17  Tilapia guineensis     C    0
#> 18      Tilapia zillii     C    2
#> 19  Eutrigla gurnardus     D    1
#> 20 Fundulus rubrifrons     D    0
#> 21        Gadus morhua     D    0
#> 22   Sprattus sprattus     D    0
#> 23  Tilapia guineensis     D    0
#> 24      Tilapia zillii     D    0
#> 25  Eutrigla gurnardus     E    0
#> 26 Fundulus rubrifrons     E    0
#> 27        Gadus morhua     E    1
#> 28   Sprattus sprattus     E    0
#> 29  Tilapia guineensis     E    0
#> 30      Tilapia zillii     E    0

Then just subset or filter on species and grade in your preferred manner

Marcus
  • 3,478
  • 1
  • 7
  • 16
0

Maybe you are interested in writing a function and returning the count of Grade on request

get_count_by_Grade <- function(df, Grade) {
     sum(df$Grade == Grade)
}

get_count_by_Grade(df, "A")
#[1] 2

get_count_by_Grade(df, "D")
#[1] 1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213