I have data set of 20 year measurements (14600x6) and need to get a geometric mean value of $tu
per $name
and $trophic
. Originally, I had my df split in three dfs and I did as follow:
Old code based on split df!!!
trophic_pp<- df_pp %>% select(sites, name, tu_pp)%>%
group_by(name) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from = name, values_from = tu_pp) %>%
replace(is.na(.), 0)%>%
select(-row)
trophic_dc<- ...... same
trophic_pt<- ...... same
then
trophic_pp<- trophic_pp%>%
mutate(sum_pp = rowSums(across(where(is.numeric))))
trophic_dc<- ...... same
trophic_pt<- ...... same
then
trophic_pp_sites <- select("trophic_pp", "sites", "sum_pp") %>%
group_by(sites) %>%
summarise(gmean = gmean(sum_pp)) %>%
add_column(trophic = "pp", .before = "gmean")
trophic_dc<- ...... same
trophic_pt<- ...... same
then I merged and reduced to finally plot
all_trophic <- Reduce(function(x, y) merge(x, y, all=TRUE), list(trophic_pp,
trophic_dc,
trophic_pt)) %>%
mutate(type = case_when(
startsWith(sites, "R") ~ "river",
startsWith(sites, "T") ~ "tributary"
))
As you can observe it is a long and repetitive code.
I rearranged my data to only one df instead of three and the str
look like this now:
tibble [14,100 x 6] (S3: tbl_df/tbl/data.frame)
$ name : Factor w/ 6 levels "Al","As","Cu",..: 1 1 1 1 1 1 1 1 1 1 ...
$ cas : chr [1:14100] "7429-90-5" "7429-90-5" "7429-90-5" "7429-90-5" ...
$ sites : chr [1:14100] "R1" "R1" "R1" "R5" ...
$ conc : num [1:14100] 12.12 12.12 12.12 2.06 2.06 ...
$ trophic : chr [1:14100] "tu_pp" "tu_pc" "tu_sc" "tu_pp" ...
$ tu : num [1:14100] 12.41 4.83 7.22 2.11 0.82 ...
Where each $name
has its own $cas
, 9 $sites
, and each $tu
is calculated based on $conc and in three different $trophics
. Therefore, $tu
is the only variable changing in every single row.
I am struggling calculating the geometric mean. I tried as follow:
define geometric mean function
gmean <- function(x, na.rm=TRUE){
gmean = exp(mean(log(x)))
}
Created a list based on $trophic
trophic_list <- split(df, df$trophic)
and run lapply function through the list
for (i in seq_along(trophic_list)) {
trophic_list[[i]] <- within(trophic_list[[i]], {
gmean <- lapply(trophic_list[tu], FUN: gmean
})
}
Sorry for the long explanation and I´ll appreciate your help