How to rename the bins in ggplot in R

Question

so basically I have created the bins and the have the means of each bin, having these two columns in a dataframe. Now I am plotting these two columns, but I want the exact number as x lable instead of bins. I am considering renaming each bin by its mid-point. please look at the pictures. The first one is my current plot and the second is the plot I want to acheive.

my current plot: what I want to have: my data frame is like this:

Hi. You've supplied no code. That makes it hard for others to help you. Please read this and then edit your question to include the suggestions therein: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?r=Saves_AllUserSaves — John Polo, Nov 05 '22 at 20:37

AndS. · Answer 1 · 2022-11-05T21:41:14.480

If you have groups that (I assume) you made with cut, you could pull out the max and min and then calc the mean before you summarize and plot. Note that I made the regex pretty long because I don't personally know if cut always makes left or inclusive or exclusive.

library(tidyverse)

#example like yours
mtcars |>
  mutate(grp = cut(hp, 10)) |>
  group_by(grp) |>
  summarise(mpg_mean = mean(mpg)) |>
  ggplot(aes(grp, mpg_mean))+
  geom_point()


#solution
mtcars |>
  mutate(grp = cut(hp, 10)) |>
  extract(grp, 
          into = c("min", "max"), 
          remove = FALSE,
          regex = "(?:\\(|\\[)(.*),(.*)(?:\\)|\\])",
          convert = TRUE) |>
  mutate(mean_grp = (min + max)/2)|>
  group_by(mean_grp) |>
  summarise(mpg_mean = mean(mpg)) |>
  ggplot(aes(mean_grp, mpg_mean))+
  geom_point()

EDIT

here is another option if you just want to re-label and not actually transform the data:

lab_fun <- function(x){
  str_split(x, ",") |>
    map_dbl(~parse_number(.x) 
        |> mean())
}

mtcars |>
  mutate(grp = cut(hp, 10)) |>
  group_by(grp) |>
  summarise(mpg_mean = mean(mpg)) |>
  ggplot(aes(grp, mpg_mean))+
  geom_point()+
  scale_x_discrete(labels = lab_fun)

Allan Cameron · Accepted Answer · 2022-11-05T22:34:06.190

To reproduce the style of the plot image you included, you can do:

library(tidyverse)

df %>%
  mutate(bin_group = gsub("\\(|\\]", "", bin_group)) %>%
  separate(bin_group, sep = ",", into = c("lower", "upper")) %>%
  mutate(across(lower:upper, as.numeric)) %>%
  mutate(`Birth weight (g)` = (upper + lower) / 2) %>%
  ggplot(aes(`Birth weight (g)`, mean_28_day_mortality)) +
  geom_vline(xintercept = 1500) +
  geom_point(shape = 18, size = 4) +
  scale_x_continuous(labels = scales::comma) +
  labs(title = "One-year mortality", y = NULL) +
  theme_bw(base_family = "serif", base_size = 20) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "black", size = 0.5),
        plot.title = element_text(hjust = 0.5))

Edit

To make the specific changes to the range use the limits argument in scale_x_continuous and scale_y_continuous you can do:

library(tidyverse)

df %>%
  mutate(bin_group = gsub("\\(|\\]", "", bin_group)) %>%
  separate(bin_group, sep = ",", into = c("lower", "upper")) %>%
  mutate(across(lower:upper, as.numeric)) %>%
  mutate(`Birth weight (g)` = (upper + lower) / 2) %>%
  ggplot(aes(`Birth weight (g)`, mean_28_day_mortality)) +
  geom_vline(xintercept = 1500) +
  geom_point(shape = 18, size = 4) +
  scale_x_continuous(labels = scales::comma, limits = c(1350, 1650),
                     breaks = seq(1350, 1650, 50)) +
  scale_y_continuous(limits = c(0, 0.1), name = NULL) +
  labs(title = "One-year mortality") +
  theme_bw(base_family = "serif", base_size = 20) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "black", size = 0.5),
        plot.title = element_text(hjust = 0.5))

Data used (obtained from image in question using OCR)

df <- structure(list(bin_group = structure(1:10, 
        levels = c("(1.35e+03,1.38e+03]", 
        "(1.38e+03,1.41e+03]", "(1.41e+03,1.44e+03]", "(1.44e+03,1.47e+03]", 
        "(1.47e+03,1.5e+03]", "(1.5e+03,1.53e+03]", "(1.53e+03,1.56e+03]", 
        "(1.56e+03,1.59e+03]", "(1.59e+03,1.62e+03]", "(1.62e+03,1.65e+03]"
        ), class = "factor"), mean_28_day_mortality = c(0.0563498, 0.04886257, 
        0.04467626, 0.04256053, 0.04248667, 0.04009187, 0.03625538, 0.03455094, 
        0.03349542, 0.02892909)), class = c("tbl_df", "tbl", "data.frame"
        ), row.names = c(NA, -10L))

Thank you！ This is extremely helpful. but I wonder is there any way to make the x lab to exactly from 1350 to 1650, and the y lab from 0 to 1? — lilgrass, Nov 05 '22 at 22:15
@Eveline see my update. I would advise against setting the y axis limits between 0 and 1 because you lose all the detail in your data that way, though the x axis changes look fine. Maybe setting y limits to `c(0, 0.1)` would be a decent comprimise? — Allan Cameron, Nov 05 '22 at 22:35

How to rename the bins in ggplot in R

2 Answers2