My data looks something like this (although there is thousands of Sample sites over ~12 years:
library(tidyverse)
df <- tribble(~Year, ~Sample, ~Total_A, ~Total_B, ~Total_C,
2000, 'Riverside', 990, 08, NA,
2000, 'Pasadena', 887, 101, 78,
2000, 'Goleta', 786, NA, NA,
2001, 'Riverside', 985, 89, 21,
2001, 'Pasadena', 992, 67, 33,
2002, 'Riverside', 991, 21, 09,
2002, 'Goleta', 351, 34, NA,
2002, 'Scottsdale', 345, NA, 75)
I have used summarize all (below) to get the following summary data table.
library(dplyr)
df1 <- df %>%
group_by(Sample) %>%
summarize_all(funs(sum(!is.na(.))))
Sample, Total_A, Total_B, Total_C
Riverside, 3, 3, 2
Pasadena, 2, 2, 2
Goleta, 2, 1, 0
Scottsdale, 1, 0, 1
I would like to add a column to the data table that gives each year of data that is available for each sample. Is there anyway I can do this using summarize_all (or any other summarize command?) I've thought maybe using something with "paste unique$Year" but unsure if that is possible. I'm new to R and would appreciate any guidance. Here is kind of what I am looking for:
Sample, Total_A, Total_B, Total_C, Years_Available
Riverside, 3, 3, 2, 2000/2001/2002
Pasadena, 2, 2, 2, 2000/2001
Goleta, 2, 1, 0, 2000/2002
Scottsdale, 1, 0, 1, 2002