I've got a df with multiple columns containing information of species sightings over the years in different sites, therefore each year might show multiple records. I would like to filter my df and calculate some operations based on certain columns, but I'd like to keep all columns for further analyses. I had some previous code using summarise but as I would like to keep all columns I was trying to avoid using it.
Let's say the columns I'm interested to work with at the moment are as follows:
df <- data.frame("Country" = LETTERS[1:5], "Site"=LETTERS[6:10], "species"=1:5, "Year"=1981:2010)
I would like to calculate:
1- The cumulative sum of the records in which a species has been documented within each site creating a new column "Spsum". 2- The number of different years that each species has been seen on a particular site, this could be done as cumulative sum as well, on a new column "nYear".
For example, if species 1 has been recorded 5 times in 1981, and 2 times in 1982 in Site G, Spsum would show 7 (cumulative sum of records) whereas nYear would show 2 as it was spotted over two different years. So far I've got this, but nYear is displaying 0s as a result.
Df1 <- df %>%
filter(Year>1980)%>%
group_by(Country, Site, Species, Year) %>%
mutate(nYear = n_distinct(Year[Species %in% Site]))%>%
ungroup()
Thanks!