1

I have the following pre-summarized cost data:

MeanCost Std MedianCost LowerIQR UpperIQR StatusGroup AgeGroup
700 500 650 510 780 Dead Young
800 600 810 666 1000 Alive Young
500 200 657 450 890 Comatose Young
300 400 560 467 670 Dead Old
570 600 500 450 600 Alive Old
555 500 677 475 780 Comatose Old
333 455 300 200 400 Dead Middle
678 256 600 445 787 Alive Middle
1500 877 980 870 1200 Comatose Middle

I wish to create a boxplot with this information - similar to the one below. Where each Color represents Status Group (blue=dead, read=alive, green=comatose). And each grouped cluster represents an age group (left cluster=young, middle cluster=middle, right cluster=old).

enter image description here

I know that I don't have min and max, so whiskers are not necessary.
I want to code this in R, and any help would be appreciated! Thank you.

Here is the code I have tried:

 dattest<- data.frame(
  Mean_Cost = c(700,800,500,300,570,555,333,678,1500), 
  Std = c(500,600,200,400,600,500,455,256,877), 
  Median_Cost = c(650,810,657,560,500,677,300,600,980), 
  LowerIQR = c(510,666,450,467,450,475,200,445,870), 
  UpperIQR = c(780,1000,890,670,600,780,400,787,1200), 
  StatusGroup = c(1,2,3,1,2,3,1,2,3),
  AgeGroup = c(1,1,1,2,2,2,3,3,3))

where for StatusGroup 1=dead, 2=alive, 3-comatose
and for AgeGroup 1=young, 2=old, 3=middle

 ggplot(dattest, aes(xmin = AgeGroup-.25, xmax=AgeGroup+.25, ymin=LowerIQR, ymax=UpperIQR)) + 
    geom_rect(fill="transparent", col = "blue") + 
    geom_segment(aes(y=Median_Cost, yend=Median_Cost, x=AgeGroup-.25, xend=AgeGroup+.25), col="blue") + 
    geom_point(mapping=aes(x = StatusGroup, y = Mean_Cost), col="red") +
    scale_x_continuous(breaks=1:3, labels=c("Young","Old","Middle")) + 
    theme_classic()

And this code is definitely not giving me what I want

KKolo
  • 35
  • 1
  • 7
  • possible duplicate: https://stackoverflow.com/questions/22212885/producing-a-boxplot-in-ggplot2-using-summary-statistics. Please don't post data as an image. It's easier to help you if you include samples data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) such as `dput()` so we can copy/paste the data into R rather than retyping it to test possible solutions. – MrFlick Apr 26 '21 at 02:58
  • Thank you - I edited it so that it a table that you can copy and paste into a CSV. I also do not think this is a duplicate because the previous post does not address how to plot the data with multiple groupings like this data. – KKolo Apr 26 '21 at 03:11
  • The "grouping" you are looking for is just a standard `fill=` aesthetic. I think it would work if you just tried it. Add the code you tried to your question. – MrFlick Apr 26 '21 at 03:12
  • I am able to do it with one group - but not two, that's where I get stuck. – KKolo Apr 26 '21 at 03:14
  • Then please add the code you tried for one group. It's much easier to edit and fix what you start then begin from scratch. – MrFlick Apr 26 '21 at 03:15
  • Yes - I have added it. Sorry - should've done that before – KKolo Apr 26 '21 at 03:32

1 Answers1

3

Is this what you are trying to do?

library(tidyverse)
df <- tibble::tribble(
  ~MeanCost, ~Std, ~MedianCost, ~LowerIQR, ~UpperIQR, ~StatusGroup, ~AgeGroup,
       700L, 500L,        650L,      510L,      780L,       "Dead",   "Young",
       800L, 600L,        810L,      666L,     1000L,      "Alive",   "Young",
       500L, 200L,        657L,      450L,      890L,   "Comatose",   "Young",
       300L, 400L,        560L,      467L,      670L,       "Dead",     "Old",
       570L, 600L,        500L,      450L,      600L,      "Alive",     "Old",
       555L, 500L,        677L,      475L,      780L,   "Comatose",     "Old",
       333L, 455L,        300L,      200L,      400L,       "Dead",  "Middle",
       678L, 256L,        600L,      445L,      787L,      "Alive",  "Middle",
      1500L, 877L,        980L,      870L,     1200L,   "Comatose",  "Middle"
  )

df %>% 
  mutate(AgeGroup = factor(AgeGroup, levels = c("Young", "Middle", "Old"))) %>% 
  ggplot(aes(x = AgeGroup, fill = StatusGroup)) +
  geom_boxplot(aes(
    lower = LowerIQR, 
    upper = UpperIQR, 
    middle = MedianCost, 
    ymin = MedianCost - Std, 
    ymax = MedianCost + Std),
    stat = "identity", width = 0.5)

test.png

Edit

To add an "x" at the mean you can adjust the position:

df %>% 
  mutate(AgeGroup = factor(AgeGroup, levels = c("Young", "Middle", "Old"))) %>% 
  ggplot(aes(x = AgeGroup, fill = StatusGroup)) +
  geom_boxplot(aes(
    lower = LowerIQR, 
    upper = UpperIQR, 
    middle = MedianCost, 
    ymin = MedianCost - Std, 
    ymax = MedianCost + Std),
    stat = "identity", width = 0.5) +
  geom_point(aes(y = MeanCost),
             position = position_dodge(width = 0.5),
             shape = 4)

test2.png

  • Yes - just about! I want to add a point for the mean, but when I add In the 'geom_point' code , it adds the mean markers next to the boxplots, and not on top of them. – KKolo Apr 26 '21 at 03:48