0

I have a data frame:

df <- structure(list(Transect = c("1", "1", "1", "2", "2", "2", "1", 
"1", "1", "1", "2", "2", "2", "1", "1", "1", "1", "2", "2", "2", 
"2", "1", "1", "1", "1", "2", "2", "2", "2", "1", "1", "2", "1", 
"1", "1", "1", "2", "2", "2", "1", "1", "1", "1", "2", "2", "2", 
"1", "1", "1", "1", "2", "2", "2"), Species = c("DOL", "STAR", 
"LOB", "DOL", "STAR", "URCH", "DOL", "STAR", "RCRAB", "LOB", 
"DOL", "STAR", "RCRAB", "DOL", "RCRAB", "STAR", "URCH", "STAR", 
"DOL", "URCH", "RCRAB", "DOL", "STAR", "RCRAB", "URCH", "DOL", 
"RCRAB", "URCH", "STAR", "CUNN", "LOB", "CUNN", "CUNN", "FLOU", 
"RCRAB", "LOB", "CUNN", "ACOD", "RCRAB", "LUMP", "CUNN", "RCRAB", 
"FLOU", "CUNN", "FLOU", "RCRAB", "CUNN", "RCRAB", "SCUL", "FLOU", 
"CUNN", "FLOU", "RCRAB"), DayofYear = c(228, 228, 228, 228, 228, 
228, 230, 230, 230, 230, 230, 230, 230, 234, 234, 234, 234, 234, 
234, 234, 234, 235, 235, 235, 235, 235, 235, 235, 235, 228, 228, 
228, 230, 230, 230, 230, 230, 230, 230, 234, 234, 234, 234, 234, 
234, 234, 235, 235, 235, 235, 235, 235, 235)), row.names = c(NA, 
-53L), class = "data.frame")

I want to create a column that sums the number of species per transect on each day. I have been using the code:

df1 <- df %>% group_by(Transect, Species, DayofYear) %>% mutate(count = n())

But it gives me a strange result:

enter image description here

How do I fix it so that for example the count value for transect 1 on day 228 is 3, value for transect 2 on day 228 is 3 etc?

Thanks in advance!

1 Answers1

2

To get the number of observations per Transect/DayofYear:

df %>% 
  count(Transect, DayofYear)

#  Transect DayofYear n
#1        1       228 5
#2        1       230 8
#3        1       234 8
#4        1       235 8
#5        2       228 4
#6        2       230 6
#7        2       234 7
#8        2       235 7

This is equivalent to:

df %>%
  group_by(Transect, DayofYear) %>%
  summarize(n = n())

To get the number of unique species appearing per Transect/DayofYear, removing duplicate appearances:

df %>% 
  distinct(Transect, Species, DayofYear) %>% 
  count(Transect, DayofYear)

#  Transect DayofYear n
#1        1       228 4
#2        1       230 6
#3        1       234 7
#4        1       235 7
#5        2       228 4
#6        2       230 5
#7        2       234 6
#8        2       235 6

To name the column different than n, you could use count(Transect, DayofYear, name = "count").

Finally, if your data is weighted (e.g. perhaps it already includes a "count" stat that you want to aggregate, you can use wt = VARIABLE so each of those rows adds the VARIABLE number instead of counting 1 for each.

Jon Spring
  • 55,165
  • 4
  • 35
  • 53