Below is a portion of something similar to my dataset:
require(dplyr)
alldata
site date percent_rank Label
01A 2013-01-01 0.32 Normal
01B 2013-01-01 0.12 Low
01C 2013-01-01 0.76 High
02A 2013-01-01 0 N/A
02B 2013-01-01 0.16 Low
02C 2013-01-01 0.5 Normal
01A 2013-01-02 0.67 Normal
01B 2013-01-02 0.01 Low
01C 2013-01-02 0.92 High
I assigned each percent_rank a label based on the value (0 to 0.25 to 0.75 to 1 for the three categories). I would now like to produce a summary table in this format:
site Low Normal High Missing
01A 32 47 92 194
01B 232 23 17 93
01C 82 265 12 6
where each site would have a count of the occurrences of low, normal, and high values for all dates with that site label (there is one for every day of the year), and N/A values would be counted for the "Missing" column.
I have tried the following:
alldata <- %>% group_by(site) %>% mutate(length(Label == "Low"))
which returns the total value of all records, rather than a count of "Low" per site, and
alldata <- %>% group_by(site) %>% mutate(length(which(Label == "Low")))
which returns a value several thousand higher than the total number of records. The idea was that I would repeat this function to create four new columns with four separate mutate lines (one for each category), and this would produce my summary table. I have also tried some variations of aggregate(), although the function component was less clear to me for what I am aiming to do. This seems like it should be a pretty straightforward thing to do (and group_by served me well for calculating the percent rank and associated labels) but I have not been able to find a solution as of yet. Any tips are much appreciated!