How to extract specific values when making a summary table (using summarize_all)?

Question

My data looks something like this (although there is thousands of Sample sites over ~12 years:

library(tidyverse)
df <- tribble(~Year, ~Sample, ~Total_A, ~Total_B, ~Total_C,
            2000, 'Riverside', 990, 08, NA,
            2000, 'Pasadena', 887, 101, 78,
            2000, 'Goleta', 786, NA, NA,
            2001, 'Riverside', 985, 89, 21,
            2001, 'Pasadena', 992, 67, 33,
            2002, 'Riverside', 991, 21, 09,
            2002, 'Goleta', 351, 34, NA,
            2002, 'Scottsdale', 345, NA, 75)

I have used summarize all (below) to get the following summary data table.

   library(dplyr)
   df1 <- df %>% 
    group_by(Sample) %>%
    summarize_all(funs(sum(!is.na(.)))) 

Sample, Total_A, Total_B, Total_C
Riverside, 3, 3, 2
Pasadena, 2, 2, 2
Goleta, 2, 1, 0
Scottsdale, 1, 0, 1

I would like to add a column to the data table that gives each year of data that is available for each sample. Is there anyway I can do this using summarize_all (or any other summarize command?) I've thought maybe using something with "paste unique$Year" but unsure if that is possible. I'm new to R and would appreciate any guidance. Here is kind of what I am looking for:

Sample, Total_A, Total_B, Total_C, Years_Available 
Riverside, 3, 3, 2, 2000/2001/2002
Pasadena, 2, 2, 2, 2000/2001
Goleta, 2, 1, 0, 2000/2002
Scottsdale, 1, 0, 1, 2002

Related / possible duplicate: [*Collapse / concatenate / aggregate a column to a single comma separated string within each group*](https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w) — Jaap, Oct 06 '17 at 08:51

Preston · Accepted Answer · 2017-10-06T09:14:33.597

1

You could try this:

df1 <- df %>% 
  group_by(Sample) %>%
  mutate(years = toString(Year)) %>%
  group_by(Sample, years) %>%
  summarize_all(funs(sum(!is.na(.))))

The output I get is:

      Sample            years  Year Total_A Total_B Total_C
       <chr>            <chr> <int>   <int>   <int>   <int>
1     Goleta       2000, 2002     2       2       1       0
2   Pasadena       2000, 2001     2       2       2       2
3  Riverside 2000, 2001, 2002     3       3       3       2
4 Scottsdale             2002     1       1       0       1

edited Oct 06 '17 at 09:14

answered Oct 06 '17 at 08:57

Preston

7,399
8
54
84

I tried and it seems to be close to what I want! However it just pastes the first year over and over (ex: the column will be years with the value being "2000, 2000, 2000"...etc.) – Juliet R Oct 06 '17 at 09:01
@JulietR Are we using the same data? I've updated the answer with the output of the sample data given – Preston Oct 06 '17 at 09:05
@User632718, For some reason its not working on my larger data set (not your fault of course). But since it works on my sample data it should be fine when I tinker with it a bit. Thank you though! – Juliet R Oct 06 '17 at 09:08
@JulietR I guess you could be having an issue with the way that you group then, you could potentially add the "unique" function in the mutate which would probably give you what you need. Also, if this answered your question, you're welcome to upvote or accept, good luck – Preston Oct 06 '17 at 09:14
@User632718, when I add unique in there would it be %>% mutate(years = toString(unique(Years))? Sorry for the basic syntax question, I've been trying to learn R on my own so I'm not super fluid with this stuff yet. – Juliet R Oct 06 '17 at 09:19
@JulietR that's what I would suggest as a starting point, hope it works – Preston Oct 06 '17 at 09:43

How to extract specific values when making a summary table (using summarize_all)?

1 Answers1