0

I am annotating my graph with summary statistics. I'd like to use a bold font to quickly draw the user's eyes to the best/worst statistics by group. The highlighted numbers would need to be determined at run-time by the data itself.

Here's an example using the ChickWeight dataset, showing changes in chick weights based on their diet:

library(ggplot2)
library(dplyr)

# Calculate end vs start weights
df <- merge(filter(ChickWeight, Time==21), filter(ChickWeight, Time==0), by=c("Chick", "Diet"))
df$dWeight <- df$weight.x - df$weight.y

# Summary statistics: sd & mean
df.stat <- do.call(data.frame, 
                   aggregate(dWeight ~ Diet, 
                             data=df, 
                             FUN = function(x) c(SD=sd(x), MN=mean(x))))

ggplot(data = df) + 
    facet_grid(Diet ~ .) +
    geom_histogram(binwidth=10, aes(x=dWeight)) + 
    geom_vline(data=df.stat, aes(xintercept = dWeight.MN), color="black") + 
    geom_text(data=df.stat, aes(x=Inf, 
                                y=Inf, 
                                label = sprintf("\nmean = %4.1f\nsd = %4.1f", 
                                                dWeight.MN, dWeight.SD), 
                                hjust=1, 
                                vjust=1)) 

In the graph below, I would want to only highlight the following text:
In group 3, "mean = 229.5" would become "mean = 229.5"
In group 4, "sd = 43.9" would become "sd = 43.9"

enter image description here

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
ddessert
  • 53
  • 1
  • 6
  • Making bold only part of the text and especially the one that is given by a variable is tricky, looks like you will need something like this https://stackoverflow.com/a/50768373/1320535. – Julius Vainora Dec 18 '18 at 01:07
  • @JuliusVainora, that technique would seem to require two `geom_text()` calls. One to print the non-bold text and another to print the bold text, leaving the other parts `phantom()`. I'm not sure how to make that work with `facet_grid()`. And I'm totally lost on how to use `expression()`, `bquote()`, `substitute()`, etc. in plotmath. Each attempt gives a new error message which leads to another rabbit hole to chase into. – ddessert Dec 18 '18 at 20:09

2 Answers2

8

1

If you don't want to muck around with parsing you can add a condition to your plot labels and you'll come quite close.

Data

df.plot <- df %>%
    # Combine df and df.stat -
    # this also removes the calls to df.stat in your secondary geoms.
    left_join(df.stat, by = "Diet") %>%
    # Add global maximum of MN and global minimum of SD to every row.
    mutate(dWeight.MN.max = max(dWeight.MN),
           dWeight.SD.min = min(dWeight.SD))

Code

ggplot(data = df.plot) + 
    facet_grid(Diet ~ .) +
    geom_histogram(binwidth = 10, aes(x = dWeight)) + 
    geom_vline(aes(xintercept = dWeight.MN), color="black") + 
    geom_text(aes(x = Inf, 
                  y = Inf, 
                  label = sprintf("\nmean = %4.1f", dWeight.MN), 
                  hjust = 1,
                  vjust = 1,
                  # bold if mean == mean maximum
                  fontface = ifelse(dWeight.MN == dWeight.MN.max, 2, 1))) +
    geom_text(aes(x = Inf, 
                  y = Inf, 
                  label = sprintf("\n\nsd = %4.1f", dWeight.SD), 
                  hjust = 1,
                  vjust = 1,
                  # bold if sd == sd minimum
                  fontface = ifelse(dWeight.SD == dWeight.SD.min, 2, 1))) +
    theme_gray()

Explanation

With fontface = you can make your geom_text() italic or bold. The ifelse() in the expression checks if the value is equal to the global maximum/minimum and sets the text to bold (= 2) if true and leaves it plain (= 1) if false.

Roman
  • 4,744
  • 2
  • 16
  • 58
  • 1
    using your technique and the `latex2exp` library, I was able to modify the `geom_text` statement to only bold the number like this: `label = TeX(sprintf(ifelse(df.plot$dWeight.MN==df.plot$dWeight.MN.max, "mean = \\textbf{%4.1f}", "mean = %4.1f"), df.plot$dWeight.MN)))` – ddessert Dec 18 '18 at 07:52
  • Great work! You should post your approach as an answer and accept it yourself. This way this question will no longer be flagged as "open". See [here](https://stackoverflow.com/help/someone-answers) how to accept an answer. – Roman Dec 18 '18 at 08:12
0

Selective, partially bold text with facet_grid

Taking the ifelse idea from @Roman, here is a solution using the latex2exp library to build a LaTeX string that allows bold font changes within the string. latex2exp translates the TeX string to a plotmath expression.

Still not perfect and not expandable beyond two lines of text. latex2exp does not appear to support newlines, forcing me to use overset instead.

Another LaTeX option would be an {n x 1} matrix, but latex2exp doesn't support matrices either (run latex2exp_supported() to see what LaTeX expressions are supported).

Or two separate geom_text commands if there was a reliable way of spacing and aligning the text as the user resizes or zooms the plot.

This solution is limited to 2-lines that can only be center-aligned to each other.

Data

library(ggplot2)
library(dplyr)
library(latex2exp)

# Calculate end - start weights
df <- inner_join(filter(ChickWeight, Time==21), 
                 filter(ChickWeight, Time==0), 
                 by=c("Chick", "Diet")) %>%
      mutate(dWeight=weight.x-weight.y) %>% 
      select(Chick, Diet, dWeight)

# Summary statistics: sd & mean
df.stats <- df %>% 
            group_by(Diet) %>% 
            summarise(MN=mean(dWeight), SD=sd(dWeight)) %>% 
            mutate(is.max.MN=(MN==max(MN))) %>% 
            mutate(is.min.SD=(SD==min(SD)))

ggplot Command

ggplot(data=df) + 
    facet_grid(Diet ~ .) +
    geom_histogram(binwidth=10, aes(x=dWeight)) + 
    geom_vline(data=df.stats, aes(xintercept = MN), color="black") + 
    geom_text(data=df.stats,
              aes(x=Inf, 
                  y=Inf, 
                  hjust=1, 
                  vjust=1),
              label = TeX(paste("\\overset{mean =", 
                                sprintf(ifelse(df.stats$is.max.MN, "\\textbf{%4.1f}", "%4.1f"), df.stats$MN),
                                "}{sd =",
                                sprintf(ifelse(df.stats$is.min.SD, "\\textbf{%4.1f} $", "%4.1f"), df.stats$SD),
                                "}"
                          )))

The label for geom_text is outside the aes function which does not appear to inherit the data namespace.

Also, this ggplot command generates a warning message (TeX statement):
In is.na(x) : is.na() applied to non-(list or vector) of type 'expression’

ddessert
  • 53
  • 1
  • 6