0

I am using geom_histogram in R to produce a histogram using the code:

ggGender <- ggplot(dfGenderGrouped, aes(log(freq), fill=dfGenderGrouped$name) ) + 
geom_histogram(data=dfGenderGrouped, binwidth = 1, alpha=0.5, color="black") + theme_bw() + 
theme(axis.title = element_text(size=16), legend.text = element_text(size=12), axis.text.y = element_text(size=12, angle=45), axis.text.x = element_text(size=12), legend.position=c(0.8,0.7)) + ylab("Number of patients") + 
xlab("Events (log)")+labs(fill="Events") + scale_y_continuous(labels = comma) + 
scale_fill_brewer(palette="Spectral")

enter image description here

The dfGenderGrouped data frame looks like:

  patid freq              name Group
1  1156    1 Male - All events   All
2  1194    1 Male - All events   All
3  1299    1 Male - All events   All
4  1445    1 Male - All events   All
5  1476    2 Male - All events   All
6  2045    2 Male - All events   All

The unique values to name are presented in the legend. The unique values to Group are:

> unique(dfGenderGrouped$Group)
[1] "All"      "Clinical" "Referral" "Therapy"

I would like to organise the stacks by the Group value e.g., in bin 0 you have a stacked column of Female - All events and Male - All events and then the same stacked column in binn 1 etc. For further clarification, I would then like Female - Clinical events and Male - Clinical events as a single stacked column also across the bins. Thus, each column of stacked values has the Group value in common (All, Clinical, Referral, and Therapy).

Further clarification, bin 0 would have the following column stacks (organised by Group in the data.frame):

Female - All events & Male - All events
Female - Clinical events & Male - Clinical events
Female - Referral events & Male - Referral events
Female - Therapy events & Male - Therapy events

Then for bin 1 the same:

Female - All events & Male - All events
Female - Clinical events & Male - Clinical events
Female - Referral events & Male - Referral events
Female - Therapy events & Male - Therapy events

Help is much appreciated.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Anthony Nash
  • 834
  • 1
  • 9
  • 26
  • The data may not be sufficient to make a plot, please make it more [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – NelsonGon Feb 24 '20 at 10:35
  • I think you're wanting to both stack _and_ dodge the same histogram, which isn't really possible, though there are hacks to achieve the effect : see https://stackoverflow.com/questions/12715635/ggplot2-bar-plot-with-both-stack-and-dodge – Allan Cameron Feb 24 '20 at 10:46

1 Answers1

3

What about facetting your graph using "Group` column such as:

library(ggplot2)
ggplot(data = df, aes(log(Freq), fill = Name))+
    geom_histogram(binwidth = 1, alpha = 0.5,color = "black")+
    facet_wrap(.~Group,nrow = 1, scales = "fixed")+
    labs(x = "Events (log)", y = "Number of patients", fill="Events") + 
    scale_fill_brewer(palette="Spectral")

enter image description here

EDIT: Simplify the legend

To simplify the legend, you can just plot Male and Female using facet_wrap, yo ujsut need to edit your "Name" column in order to remove all the right part of the string and keep only Male / Female denomination

df$Name <- sub("-.*","",df$Name))
ggplot(data = df, aes(log(Freq), fill = Name))+
  geom_histogram(binwidth = 1, alpha = 0.5,color = "black")+
  facet_wrap(.~Group,nrow = 1, scales = "fixed")+
  labs(x = "Events (log)", y = "Number of patients", fill="Events") + 
  scale_fill_brewer(palette="Spectral")

enter image description here

Alternative using grid.arrange

Alternatively, you can create 4 plots and arrange them on a single figure using grid.arrange function from gridExtra package. Like that, youwill have a legend for each plot:

library(gridExtra)
ALL <- ggplot(data = subset(df, Group == "ALL"), aes(log(Freq), fill = Name))+
  geom_histogram(binwidth = 1, alpha = 0.5,color = "black")+
  labs(x = "Events (log)", y = "Number of patients", fill="Events", title = "ALL") + 
  scale_fill_brewer(palette="Spectral")+
  scale_x_continuous(limits = c(4,9), breaks = 4:9)+
  theme_bw()+
  theme(legend.position=c(0.3,0.7),
        legend.text = element_text(size=8),
        legend.title = element_text(size = 8))

Clin <- ggplot(data = subset(df, Group == "Clin"), aes(log(Freq), fill = Name))+
  geom_histogram(binwidth = 1, alpha = 0.5,color = "black")+
  labs(x = "Events (log)", y = "Number of patients", fill="Events", title = "Clinical") + 
  scale_fill_brewer(palette="Spectral")+
  scale_x_continuous(limits = c(4,9), breaks = 4:9)+
  theme_bw()+
  theme(legend.position=c(0.3,0.7),
        legend.text = element_text(size=8),
        legend.title = element_text(size = 8))

Ref <- ggplot(data = subset(df, Group == "Ref"), aes(log(Freq), fill = Name))+
  geom_histogram(binwidth = 1, alpha = 0.5,color = "black")+
  labs(x = "Events (log)", y = "Number of patients", fill="Events", title = "Ref") + 
  scale_fill_brewer(palette="Spectral")+
  scale_x_continuous(limits = c(4,9), breaks = 4:9)+
  theme_bw()+
  theme(legend.position=c(0.3,0.7),
        legend.text = element_text(size=8),
        legend.title = element_text(size = 8))

Ther <- ggplot(data = subset(df, Group == "Ther"), aes(log(Freq), fill = Name))+
  geom_histogram(binwidth = 1, alpha = 0.5,color = "black")+
  labs(x = "Events (log)", y = "Number of patients", fill="Events", title = "Ther") + 
  scale_fill_brewer(palette="Spectral")+
  scale_x_continuous(limits = c(4,9), breaks = 4:9)+
  theme_bw()+
  theme(legend.position=c(0.3,0.7),
        legend.text = element_text(size=8),
        legend.title = element_text(size = 8))

grid.arrange(nrow = 1, ALL, Clin, Ref, Ther)

enter image description here

Does it look what you are trying to achieve ? If not, can you clarify your question ?


NB: Please take a look to my code to learn how to properly make a ggplot2 graph, for example once you have declared the dataframe using data =, you don't need anymore $ to design column names.


Reproducible example:

df <- data.frame(Group = rep(c("ALL","Clin","Ref","Ther"),each = 50),
                   Name = rep(rep(c("M","F"), each = 25),4),
                   Freq = sample(1:10000,200, replace = TRUE),
                   Patient = sample(1000:5000,200,replace = TRUE))
  df$Name = paste(df$Name,df$Group,sep = " - ")
dc37
  • 15,840
  • 4
  • 15
  • 32
  • Thank you dc37. That looks great. I've tried using facetting and although it does produce a nice display, for the sake of publications (and therefore at the request of my line manager), I would like to simply the legend. I've looked into the documentation and there doesn't appear to be any means of adding a small legend per facet. As for how to properly use ggplot2, I like to keep the data.frame explicitly defined. – Anthony Nash Feb 24 '20 at 13:20
  • I edited my answer to propose you two alternative for plotting your individual groups and having or not simple legends with or without legends for each panels. Let me know if it is what you are looking for. Also, by using $ to call column names, you are calling the dataframe outside of `ggplot2` (bypassing the dataframe you register using `data = ...`. Most of cases, you won't notice the difference but if you are trying to plot a subset of the dataframe and you are calling the variable using $, the subset won't be effective. – dc37 Feb 24 '20 at 16:42
  • Thank you for the thorough examples and explanations! The grid.arrange code explanation has been very helpful. – Anthony Nash Feb 27 '20 at 19:23