I'm trying to produce a table with summary totals and means across the whole dataset, and then by sub-category (f_grp), and show this by site.
I can use the group_by
function to group by, which works well for reporting total_count and Mean_per_litre, but I would then like the same values for each category, as shown in f_grp.
|Site |total_count |Mean_per_litre
|1 |66 |3.33333333
|2 |77 |4.27777778
|3 |65 |3.38541667
|4 |154 |8.85057471
etc
I've tried group_by for both site and f_grp but this isn't quite right
|site |f_grp |total_count |mean_per_litre
|1 |1c |3 |1.666667
|1 |1d |15 |4.166667
|1 |2a |1 |1.666667
|1 |2b |47 |11.190476
This isn't quite right as its not easy to read and I've now lost the original total columns I had in the first table (sorry about the tables, cant get them to work here).
dat$site=as.factor(dat$site)
dat$count=as.numeric(dat$count)
dat$f_grp=as.factor(dat$f_grp)
# totals across all f_grp
tabl1 <- dat %>%
group_by(site) %>%
summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site))
tabl1
# totals FG 1b
tabl2 <- dat %>%
group_by(site) %>%
filter(f_grp== '1b') %>%
summarise ('1b_total_count' = sum(count))
tabl2
### BUT - this doesnt give a correct mean, as it only shows the mean of '1b' when only '1b' is present. I need a mean over the entire dataset at that site.
# table showing totals across whole dataset
tabl7 <- dat %>%
summarise (total_count = sum(count, na.rm = TRUE), Total_mean_per_litre = mean(count_l_site, na.rm = TRUE))
tabl7
# table with means for each site by fg
table6 <- dat %>%
group_by(site, f_grp) %>%
summarise (total_count = sum(count), mean_per_litre = mean(count_l_site, na.rm = TRUE))
table6
Ideally I need a way to extract the f-grp categories, put them as column headings, and then summarise means by site for those categories. But filtering the data and then joining multiple tables, gives incorrect means (as not mean of whole dataset, but a subset of that category, ie: when f_grp value is present only).
Many thanks to all who have read this far :)
> dput(head(dat))
structure(list(X = 1:6, site = structure(c(1L, 10L, 11L, 12L,
13L, 14L), levels = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18"), class = "factor"),
count = c(0, 0, 0, 0, 0, 0), f_grp = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), levels = c("1b", "1c", "1d", "2a", "2b"), class = "factor"),
count_l_site = c(0, 0, 0, 0, 0, 0)), row.names = c(NA, 6L
), class = "data.frame")
Updated:
Following advice here from Jon, and using the mtcars data (which worked as expected) I've tried the same method using my own data.
I can almost produce what's needed, but the totals are coming through as a row when they are needed as a column.
tabl1 <- dat %>%
group_by(site) %>%
summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site)) %>%
mutate(fg = "total")
tabl1
tabl2_fg <- dat %>%
group_by(site, f_grp = as.character(f_grp)) %>%
summarize(total_count = sum(count), Mean_per_litre = mean(count_l_site))
tabl2_fg
tabl4 <-
bind_rows(tabl1, tabl2_fg) %>%
arrange(site, f_grp) %>%
tidyr::pivot_wider(names_from = f_grp, values_from = c(Mean_per_litre, total_count), names_vary = "slowest")
tabl4
Output as follows
Next steps: move the circled outputs and put them at the beginning of the table remove every other line result - left with a simple table rows = sites; columns: total count; total mean; then columns for each fg count & mean: eg 1c count; 1c mean; 1d count; 1d mean.