0

I'm trying to plot a very simple dataframe which contains 7 categories (var) and their respective proportions (prop), but I wanted to plot it with gradients within bars. I was able to do so, but now I cannot display my proportions on the plot anymore. How can I display proportions on this self-made gradient in ggplot2 ?

  • The simpler plot is this:

first plot

  • I want it to look like this, but with the proportions and Ns on it:

Note: it should return 7 categories, not 4

plot2

I tried with geom_text and with annotate(), but I couldn't find a solution.

  • Questions :

  • Q1: How can I plot the proportions and ns on the second plot?

  • Q2: I've never used geom_tile() before, so I've read its documentation. Still, I still didn't understand how it understands that I only have 7 groups and displays the bars according to ther proportions. Thanks in adv.

Code and data are below:

  • Plot 1 code:
library(tidyverse)
library(forcats)
library(stringr)

## This is the original dataframe: 

> df
# A tibble: 7 x 4
# Groups:   var [7]
  var   x         n  prop
  <fct> <fct> <int> <dbl>
1 VAR_A yes       7  13.7
2 VAR_B yes      30  58.8
3 VAR_C yes      49  96.1
4 VAR_D yes      48  94.1
5 VAR_E yes      47  92.2
6 VAR_F yes      39  76.5
7 VAR_G yes      21  41.2

# plot it: 

df %>% 
  ggplot(aes(y = prop, x = fct_rev(var),  
             fill = as.integer(df$var))) +  ## HACK to convert discrete to cont
  geom_bar(stat = "identity", width = 0.3) +
  scale_fill_gradient2(name = "category", 
                       low = "#F1B454", 
                       mid= "#F1EF54",
                       high = "green",
                       space = "Lab",
                       midpoint = 4,
                       guide= 'legend') +
  coord_flip(clip = "off") +
  geom_text(aes(label = str_glue('{round(prop, 1.5)}%  ')),
            nudge_y = 1, 
            nudge_x = 0.05, 
            size = 1.5) + 
  geom_label(aes(label= str_glue('n = {n}')), 
             size = 1.3, 
             nudge_x = 0.3,
             show.legend = FALSE) 
  • Plot 2 code: (adapted from here)
## df transformation:

df_expanded2 <- df  %>%
  rowwise() %>%
  summarise(group = var,
            value = list(0:n)) %>%
  unnest(cols = value)

## to recalculate proportions (if needed):

df_expanded2 %>% 
            group_by(var) %>% 
             mutate(prop = round(value * 100/51, 2))

## plot it:

df_expanded2 %>%
  ggplot() +
  geom_tile(aes(
    x = fct_rev(var),
    y = value,
    fill = value), 
    width = 0.9) + ## adds space between bars 
  coord_flip() +
  scale_fill_gradient2(name = "category", 
                       low = "#F1B454", 
                       mid= "#F1EF54",
                       high = "green",
                       space = "Lab",
                       midpoint = 4,
                       guide= 'legend') ## it should be 7, not 4

  • data:
structure(list(var = structure(1:7, .Label = c("VAR_A", "VAR_B", 
"VAR_C", "VAR_D", "VAR_E", "VAR_F", "VAR_G"), class = "factor"), 
    x = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("no", 
    "yes"), class = "factor"), n = c(7L, 30L, 49L, 48L, 47L, 
    39L, 21L), prop = c(13.73, 58.82, 96.08, 94.12, 92.16, 76.47, 
    41.18)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), groups = structure(list(var = structure(1:7, .Label = c("VAR_A", 
"VAR_B", "VAR_C", "VAR_D", "VAR_E", "VAR_F", "VAR_G"), class = "factor"), 
    .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 7L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .drop = TRUE))
  • edit:
# putting back original names (not working)

library(forcats)


df <- df %>% 
mutate(var = fct_collapse(var, 
                                         'Berçário' = 'VAR_A',
                                         'Maternal' = 'VAR_B',
                                         'Educação Infantil' = 'VAR_C',
                                         'Anos Iniciais do E.F.' = 'VAR_D',
                                         'Anos Finais do E.F.' = 'VAR_E',
                                         'Fundamental 2' = 'VAR_F',
                                         'Médio' = 'VAR_G'))
Larissa Cury
  • 806
  • 2
  • 11

1 Answers1

1

I find your code and plot rather confusing. It is not clear to me as to what the gradient is representing and what you mean by expecting "7 and not 4 categories". I also don't think that geom_tile accurately represents your data - your x shows wrong values?

I find it easier to create segments (as per the most upvoted answer int the linked thread). It is then just as easy to add your annotation as before.

I've changed the guide to a colorstep guide because I feel that this is what you were after.

library(tidyverse)

w <- .4
df_segs <-
  df %>%
## splitting to create list
  split(.$var) %>%
## looping over groups and create segments according to your categories 
## (convert factor to integers first so you can add/subtract a specific width for your segments) 
## x/xend are best generated by a sequence between 0 and your last x value
  map(~ {
    segs <- seq(0, .x$prop, .01)
    cbind(.x, x0 = segs, xend = segs)
  }) %>%
  bind_rows(.id = "var") %>%
  mutate(
    var_f = as.factor(var),
    var_i = as.integer(var_f),
    y = var_i - w,
    yend = var_i + w)

ggplot(df_segs) +
  geom_segment(aes(x = x0, xend = xend, y = y, yend = yend, color = x0)) +
  geom_text(
    data = df,
    aes(x = prop, y = as.integer(as.factor(var)),
      label = str_glue("{round(prop, 1.5)}%")), 
    hjust = 1) +
  scale_color_gradient2(
    name = "category",
    low = "#F1B454",
    mid = "#F1EF54",
    high = "green",
    midpoint = 4,
    guide = "colorsteps",
  ) +
  scale_y_continuous(breaks = 1:7, labels = unique(df$var))

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Hi, @tjebo , many thanks! So, I need 7 categories because I have 7 categories (VAR_A to VAR_G), not 4. This is what I'm getting in the simpler post when I add ```guide = "legend"``` to ```scale_fill_gradient2``` , but it is not working in the other plots. Would you mind explaining the first part in more detail? I've never used ```geom_segment``` before, but I've just checked the doc. I didn't understand, tho, the process to transform the dataframe so that we could use it with ```geom_segment``` :) – Larissa Cury Mar 07 '23 at 17:40
  • @LarissaCury added explanation to comment in code. I'm still not sure why you would expect 7 categories, because your colors don't have anything to do with VAR_A, etc..? – tjebo Mar 07 '23 at 18:53
  • thank you!! but see it is not working when I use my original var names :( Can you take a look at the edit? Now the order doens't work anymore. ps: I also couldn't identify in the post what is the equivalent for ```fct_rev``` in the original one? @tjebo – Larissa Cury Mar 08 '23 at 10:47
  • also: in this particular case, the gradient doens't have a 'functional' meaning since I'm dealing with categorical variables, we have 7 different categories from a survey with schools which reported to offer those levels accordingly to the prop and we'd like the plot (and hence, the legend) to reflect the offered proportions according to each category @tjebo – Larissa Cury Mar 08 '23 at 10:49
  • so I guess that the problem w/ the real ```var``` names was that when we passed ```var_f = factor(var)``` it would order it alphabetically and hence ```var_i``` was getting the wrong order. I was able to fix that by changing the levels before creating ```var_i```. Now it is working! But I still haven't figured out where is the the equiv of ```fct_rev```, meaning, I'd like to know where am I inverting the scale – Larissa Cury Mar 08 '23 at 11:44