0

So I have two systems executing two benchmarks from which I collect two metrics.

df1 <- data.frame(Benchmark = c("Benchmark1", "Benchmark2"),
              Metric1   = c(120, 200),
              Metric2   = c(200, 150))
df2 <- data.frame(Benchmark = c("Benchmark1", "Benchmark2"),
              Metric1   = c(100, 150),
              Metric2   = c(200, 180))

Now I prepare this dataframe for plotting with ggplot

df <- left_join(df1, df2, by = "Benchmark") %>%
  gather(Metric,Value,2:5) %>%
  mutate(System = ifelse(grepl(".x", Metric), "System1", "System2"),
         Metric = ifelse(grepl("1" , Metric), "Metric1", "Metric2"))

And I can get a nice chart like this

ggplot(df %>% filter(Metric == "Metric1"), aes(x = Benchmark, y = Value, fill = System)) + 
       geom_col(position = "dodge")

Nice chart

Now I want to add a new set of bars with the geomean of those metrics, for each of those systems.

My dataframe needs to contain 2 x 2 = 4 new rows for each (System, Metric) combination containing the geomean of the values of the benchmarks for each (System, Metric) combination.

I know I can use base R to select data frame columns matching a criteria, getting the mean and then manually entering new rows using bind_rows. Is there a more automated way to accomplish this using dplyr? Perhaps with some combination of group_by() with other function?

Thanks in advance.

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
mbrandalero
  • 386
  • 1
  • 3
  • 15

2 Answers2

2

Are you looking for something like this?

Wrangled dataset:

library(dplyr)
library(tidyr)

df2 <- df %>%
  group_by(Metric, System) %>%
  mutate(GM = gm_mean(Value)) %>%
  ungroup() %>%
  spread(Benchmark, Value) %>%
  gather(x, y, -Metric, -System)

> df2
# A tibble: 12 x 4
    Metric  System          x        y
     <chr>   <chr>      <chr>    <dbl>
 1 Metric1 System1         GM 154.9193
 2 Metric1 System2         GM 122.4745
 3 Metric2 System1         GM 173.2051
 4 Metric2 System2         GM 189.7367
 5 Metric1 System1 Benchmark1 120.0000
 6 Metric1 System2 Benchmark1 100.0000
 7 Metric2 System1 Benchmark1 200.0000
 8 Metric2 System2 Benchmark1 200.0000
 9 Metric1 System1 Benchmark2 200.0000
10 Metric1 System2 Benchmark2 150.0000
11 Metric2 System1 Benchmark2 150.0000
12 Metric2 System2 Benchmark2 180.0000

Function for calculating geometric mean was taken from the accepted answer to this question.

Plot (faceted to show both Metric1 & Metric2 at the same time):

ggplot(df2, 
       aes(x = x, y = y, fill = System)) + 
  geom_col(position = "dodge") +
  facet_grid(Metric~.)

plot

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • Yes! That's exactly what I was looking for. I didn't think of combining spread + gather one more time to achieve this. (so, there's no command that unites this functionality into a single one?) – mbrandalero Oct 23 '17 at 14:05
0
df <- left_join(df1, df2, by = "Benchmark") %>%
  gather(Metric,Value,2:5) %>%
  mutate(System = ifelse(grepl(".x", Metric), "System1", "System2"),
         Metric = ifelse(grepl("1" , Metric), "Metric1", "Metric2"))

df<-df%>%group_by(Benchmark,Metric)%>%
          summarise(Value = mean(Value,na.rm=TRUE))%>%
          mutate(System = "Mean")%>%
          bind_rows(.,df)


ggplot(df %>% filter(Metric == "Metric1"), aes(x = Benchmark, y = Value, fill = System)) + 
       geom_col(position = "dodge")
Dries
  • 470
  • 4
  • 24