0

I have time series data across several sites with species presence absence across years. I want to see which species persisted, dropped or added at each site during the whole timeseries. I am only trying to compare the first and last year i.e. 2000 and 2002. The information of species should be saved in a separate column fate. I am pasting example of data and desired result. Any help is highly appreciated.

structure(list(site = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,  2L,
 2L, 2L, 2L, 2L, 2L, 2L), year = c(2000L, 2000L, 2000L, 2000L,  2001L,
 2001L, 2001L, 2002L, 2002L, 2000L, 2000L, 2001L, 2001L,  2001L, 2001L,
 2002L, 2002L), spe = c("A", "B", "C", "D", "B",  "C", "D", "A", "D",
 "A", "B", "A", "B", "C", "D", "C", "D")), class = "data.frame",
 row.names = c(NA, -17L))

desired result

Limey
  • 10,234
  • 2
  • 12
  • 32
Gull
  • 107
  • 8
  • 1
    please add reproducible data by `dput` – Kian Apr 21 '21 at 10:34
  • You will also need to define exactly what defines "persist", "dropped" and "added". For example, if a specied is present in 2000 and 2002 but not in 2001, has it persisted or been dropped and added, or something else? – Limey Apr 21 '21 at 10:38
  • @Limey yes, I am only interested in first and last year. I have edited my question to be more specific. – Gull Apr 21 '21 at 10:40
  • @RonakShah thank you and sorry for this. I am reading about it and next time, I will be careful. – Gull Apr 21 '21 at 11:09

3 Answers3

2

Using data.table, you can solve it as follows:

library(data.table)

setDT(df)

df[year %in% c(2000, 2002), 
   .(fate = if(.N==2) "persist" else if(year==2002) "added" else "dropped"),
   by = .(site, spe)]

#     site    spe    fate
# 1:     1      A persist
# 2:     1      B dropped
# 3:     1      C dropped
# 4:     1      D persist
# 5:     2      A dropped
# 6:     2      B dropped
# 7:     2      C   added
# 8:     2      D   added
1

please read before check my solution ;

yourdata%>% 
arrange(year) %>% 
filter(year==min(year)) -> first

yourdata%>% 
arrange(year) %>% 
filter(year==max(year)) -> last


checker <- function(x,y,z){
    log_ <- ifelse(((x%in%y)& !(x%in%z)),'dropped','added')
    return(log_)
}


first %>% 
rbind(last) %>% 
group_by(site,spe) %>% 
summarise(n=n()/2) %>% 
ungroup %>% 
mutate(fate=ifelse(n==1,'persist','non_persist')) %>%
mutate(fate=ifelse(fate=='non_persist',checker(spe,first$spe,last$spe),fate)) %>% 
select(site,spe,fate)
Samet Sökel
  • 2,515
  • 6
  • 21
1

For every site you can keep data only for min and max value and assign the fate column based on conditions using case_when.

library(dplyr)

df %>%
  group_by(site) %>%
  filter(year %in% range(year)) %>%
  group_by(spe, .add = TRUE) %>%
  summarise(fate = case_when(n() == 2 ~ 'persist', 
                             any(year != max(.$year)) ~ 'dropped', 
                             any(year != min(.$year)) ~ 'added')) %>%
  ungroup


#   site spe   fate   
#  <int> <chr> <chr>  
#1     1 A     persist
#2     1 B     dropped
#3     1 C     dropped
#4     1 D     persist
#5     2 A     dropped
#6     2 B     dropped
#7     2 C     added  
#8     2 D     added  
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Is it possible to calculate the fate for consecutive years not just first and last year – Gull Aug 24 '23 at 21:00