0

I have these xy data which I'm plotting using R's ggplot2 geom_violin:

library(dplyr)
library(ggplot2)

set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1)),
                 age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d24",600),rep("d8",500),rep("d24",500)),
                 group = c(rep("A",1500),rep("B",1100),rep("C",1000))) %>%
  dplyr::mutate(time = as.integer(age)) %>%
  dplyr::arrange(group,time) %>%
  dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
ggplot(df,aes(x=group_age,y=value,fill=age,color=age)) + 
  geom_violin(alpha=0.5) + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  theme_minimal()

Which gives:

enter image description here

Now I'd like to change the x-axis ticks to have group tick labels centered below each group.

Assuming that:

ggplot_build(ggplot(df,aes(x=group_age,y=value,fill=age,color=age)) + 
  geom_violin(alpha=0.5) + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  theme_minimal())$data[[2]]$xid

gives the x-axis locations I used scale_x_discrete specifying in breaks the midpoints per each group:

ggplot(df,aes(x=group_age,y=value,fill=age,color=age)) + 
  geom_violin(alpha=0.5) + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) + 
  scale_x_discrete(breaks=c(2,4.5,6.5),labels=c("A","B","C")) + theme_minimal()

But this doesn't seem to be giving me the desired outcome: enter image description here

Trying scale_x_continuous instead of scale_x_discrete gives this error:

Error: Discrete value supplied to continuous scale

Any idea how to get the x-axis ticks to be located at:

c(2,4.5,6.5)

With these labels:

c("A","B","C")
dan
  • 6,048
  • 10
  • 57
  • 125
  • Can't you have `x=group`, and the fix the dodging as described here: [Align violin plots with dodged box plots](https://stackoverflow.com/questions/27012500/align-violin-plots-with-dodged-box-plots/27012593#27012593) – Henrik Jul 19 '20 at 20:36
  • To @Henrik - I'd rather not because I'd like to add `lm` `regression` lines to each `group` using: `geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T)` and that's not going to work with `position_dodge` idea – dan Jul 19 '20 at 20:51
  • To @Peter - which part of that post are you referring to? It's rather long. But I'd rather avoid `facet`s – dan Jul 19 '20 at 20:55
  • OK, but I find it a bit strange to fit a regression line when your x-axis seems to be discrete, or am I misunderstanding? – Henrik Jul 19 '20 at 21:21
  • Your time variable is all blank. – Edward Jul 19 '20 at 21:21
  • @Henrik - I have 3 `time` points for `group` `A` and two for `group`s `B` and `C`. Perhaps not the best design but one can still fit an `lm` to that. @Edward- not sure why you're saying `df$time` is all blank. As far as I see it's not. – dan Jul 19 '20 at 21:29
  • @Peter - I simply want to force the `x-axis` `ticks` to be located at `c(2,4.5,6.5)` and have the `labels` `c("A","B","C")`. I see that @teunbrand's approach is using `scale_x_continuous`. I haven't tried his code that uses `ggh4x` but as my post says, `scale_x_continuous` gives an error. – dan Jul 19 '20 at 21:52
  • Sorry I think I am over complicating things - I'll delete my comments. – Peter Jul 19 '20 at 21:53

1 Answers1

1

One way is to use the group aesthetic.

ggplot(df, aes(x=group_age, y=value, fill=age, color=age, group = cut_width(group_age, 1))) + 
  geom_violin(alpha=0.5) + 
  geom_boxplot(width=0.1, aes(middle=mean(value))) + 
theme_minimal() + xlab("Group") + 
scale_x_continuous(breaks=c(6,10,14), labels=c("A","B","C"))

enter image description here


Data:

set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1)),
                 age = factor(c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d24",600),rep("d8",500),rep("d24",500))),
                 group = factor(c(rep("A",1500),rep("B",1100),rep("C",1000)))) %>%
  mutate(group_age=as.numeric(group)*4+as.numeric(age)) %>%
  arrange(group_age)

Note the formula to create group_age to keep it numeric.

Packages:

library(ggplot2)
library(dplyr)
Edward
  • 10,360
  • 2
  • 11
  • 26