0

so basically I have created the bins and the have the means of each bin, having these two columns in a dataframe. Now I am plotting these two columns, but I want the exact number as x lable instead of bins. I am considering renaming each bin by its mid-point. please look at the pictures. The first one is my current plot and the second is the plot I want to acheive.

my current plot: enter image description here what I want to have: enter image description here my data frame is like this: enter image description here

lilgrass
  • 13
  • 3
  • 1
    if youre using `cut` there is a `labels` argument – rawr Nov 05 '22 at 20:34
  • 1
    Hi. You've supplied no code. That makes it hard for others to help you. Please read this and then edit your question to include the suggestions therein: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?r=Saves_AllUserSaves – John Polo Nov 05 '22 at 20:37

2 Answers2

1

If you have groups that (I assume) you made with cut, you could pull out the max and min and then calc the mean before you summarize and plot. Note that I made the regex pretty long because I don't personally know if cut always makes left or inclusive or exclusive.

library(tidyverse)

#example like yours
mtcars |>
  mutate(grp = cut(hp, 10)) |>
  group_by(grp) |>
  summarise(mpg_mean = mean(mpg)) |>
  ggplot(aes(grp, mpg_mean))+
  geom_point()


#solution
mtcars |>
  mutate(grp = cut(hp, 10)) |>
  extract(grp, 
          into = c("min", "max"), 
          remove = FALSE,
          regex = "(?:\\(|\\[)(.*),(.*)(?:\\)|\\])",
          convert = TRUE) |>
  mutate(mean_grp = (min + max)/2)|>
  group_by(mean_grp) |>
  summarise(mpg_mean = mean(mpg)) |>
  ggplot(aes(mean_grp, mpg_mean))+
  geom_point()

EDIT

here is another option if you just want to re-label and not actually transform the data:

lab_fun <- function(x){
  str_split(x, ",") |>
    map_dbl(~parse_number(.x) 
        |> mean())
}

mtcars |>
  mutate(grp = cut(hp, 10)) |>
  group_by(grp) |>
  summarise(mpg_mean = mean(mpg)) |>
  ggplot(aes(grp, mpg_mean))+
  geom_point()+
  scale_x_discrete(labels = lab_fun)

AndS.
  • 7,748
  • 2
  • 12
  • 17
1

To reproduce the style of the plot image you included, you can do:

library(tidyverse)

df %>%
  mutate(bin_group = gsub("\\(|\\]", "", bin_group)) %>%
  separate(bin_group, sep = ",", into = c("lower", "upper")) %>%
  mutate(across(lower:upper, as.numeric)) %>%
  mutate(`Birth weight (g)` = (upper + lower) / 2) %>%
  ggplot(aes(`Birth weight (g)`, mean_28_day_mortality)) +
  geom_vline(xintercept = 1500) +
  geom_point(shape = 18, size = 4) +
  scale_x_continuous(labels = scales::comma) +
  labs(title = "One-year mortality", y = NULL) +
  theme_bw(base_family = "serif", base_size = 20) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "black", size = 0.5),
        plot.title = element_text(hjust = 0.5))

enter image description here


Edit

To make the specific changes to the range use the limits argument in scale_x_continuous and scale_y_continuous you can do:

library(tidyverse)

df %>%
  mutate(bin_group = gsub("\\(|\\]", "", bin_group)) %>%
  separate(bin_group, sep = ",", into = c("lower", "upper")) %>%
  mutate(across(lower:upper, as.numeric)) %>%
  mutate(`Birth weight (g)` = (upper + lower) / 2) %>%
  ggplot(aes(`Birth weight (g)`, mean_28_day_mortality)) +
  geom_vline(xintercept = 1500) +
  geom_point(shape = 18, size = 4) +
  scale_x_continuous(labels = scales::comma, limits = c(1350, 1650),
                     breaks = seq(1350, 1650, 50)) +
  scale_y_continuous(limits = c(0, 0.1), name = NULL) +
  labs(title = "One-year mortality") +
  theme_bw(base_family = "serif", base_size = 20) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "black", size = 0.5),
        plot.title = element_text(hjust = 0.5))

enter image description here

Data used (obtained from image in question using OCR)

df <- structure(list(bin_group = structure(1:10, 
        levels = c("(1.35e+03,1.38e+03]", 
        "(1.38e+03,1.41e+03]", "(1.41e+03,1.44e+03]", "(1.44e+03,1.47e+03]", 
        "(1.47e+03,1.5e+03]", "(1.5e+03,1.53e+03]", "(1.53e+03,1.56e+03]", 
        "(1.56e+03,1.59e+03]", "(1.59e+03,1.62e+03]", "(1.62e+03,1.65e+03]"
        ), class = "factor"), mean_28_day_mortality = c(0.0563498, 0.04886257, 
        0.04467626, 0.04256053, 0.04248667, 0.04009187, 0.03625538, 0.03455094, 
        0.03349542, 0.02892909)), class = c("tbl_df", "tbl", "data.frame"
        ), row.names = c(NA, -10L))
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you! This is extremely helpful. but I wonder is there any way to make the x lab to exactly from 1350 to 1650, and the y lab from 0 to 1? – lilgrass Nov 05 '22 at 22:15
  • @Eveline see my update. I would advise against setting the y axis limits between 0 and 1 because you lose all the detail in your data that way, though the x axis changes look fine. Maybe setting y limits to `c(0, 0.1)` would be a decent comprimise? – Allan Cameron Nov 05 '22 at 22:35