1

This question has been asked before but without reproducible example. Unfortunately I have been running into the same problem with my data set. I would like some help. I am a R beginner but I have tried to make a reproducible example. Sorry if the code is clunky!

IND <- c(1:20)
G<- c(rep("A", 10), rep("B", 5), rep("C", 5)) 
SG1 <- c(rep("1", 3), rep("2", 2), rep("1", 2), rep("2",3), rep("3", 5), rep("4",5))
SG2 <- c(rep("1", 4), rep("2", 4), rep("3",4), rep("4", 3), rep("",5))
X <- c(7,8,9,10,11,12,11,10,11,12,25,27,29,26,28,35,36,37,38,39) 

df<- data.frame(IND, G, SG1, SG2, X)`

My dataframe consists of 5 columns as described: IND: individual name. G: main group of 3 classes, namely A, B, C. SG1: subgroup 1, 2... these are subgroups of A. However, while plotting I figured that instead of having empty values for B and C, it is best to give them a number. SG2: disease severity as 1, 2, 3, 4... these are applicable to main group A and B. X: value to be plotted.

I am trying to plot with ggplot such that x axis is main group G and y is value X. I want to visualize boxplot such that I will see SG1 breakdown. So, ideally, 2 box and whisker plots under A. 1 for B, 1 for C.

I achieved this with

ggplot(data = df, aes(x= G, y= X, fill= SG1))+
  geom_boxplot()

enter image description here Now I want to add individual points to see distributions. The points should have shape as per subgroup, SG1.

I have tried

ggplot(data = df, aes(x= G, y= X, fill= SG1))+
  geom_boxplot()+
  geom_point(shape= factor(SG1))

Unfortunately, the point falls in between the SG 1 and 2 for group A. enter image description here Ideally, the position should be similar to the boxplot.

Any suggestions on this?

Thanks!

  • You can just about make it work with `geom_point(shape= factor(SG1), position = position_dodge(0.75) )`. More details are [in this question](https://stackoverflow.com/questions/63038423/default-spacing-of-grouped-boxplots-in-ggplot2-how-to-derive-correct-position-d) – Miff Sep 03 '21 at 08:26
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Sep 03 '21 at 16:24

1 Answers1

2

This should do what you want:

ggplot(data = df, aes(x= G, y= X, fill= SG1)) +
    geom_boxplot(position=position_dodge(width=1)) +
    geom_point(aes(shape= factor(SG1), group=factor(SG1)),
               position=position_dodge(width=1))

position_dodge is already used in grouped boxplots by default. However, you have to specify it again to make sure the width (how far the boxplots of different groups are set apart) is consistent with geom_point.

enter image description here

 

Edit:
Adding shapes varying by SG2:

ggplot(data = df, aes(x= G, y= X, fill= SG1))+
    geom_boxplot(position=position_dodge(width=1))+
    geom_point(aes(shape= factor(SG2), group=factor(SG1)),
                   position=position_dodge(width=1))+
    guides(shape=guide_legend(title="SG2"))
benimwolfspelz
  • 679
  • 5
  • 17
  • It worked! I got the plot I wanted ! Thank you so much. – Nidhi Desai Sep 03 '21 at 08:25
  • As a add on, I was wondering if I could color/ shape dots with SG2 instead of SG1. It returns error as SG2 has empty cells in last 5 observations. – Nidhi Desai Sep 03 '21 at 08:27
  • You can do that, see my edit. I would recommend not using two different color codings (`fill` for SG1 and `color` for SG2) in the same plot, better stick with shapes, and so did I. – benimwolfspelz Sep 03 '21 at 09:28