3

I'm using ggpairs for data with 3 groups. The problem is that not all variables have all groups and therefore, some correlations only need to show 2 groups. Because of the automatic alphabetical ordering of the groups by ggpairs, the colouring is not consistent. The first colour is always assigned to the first factor level. (For example: group 1 = red, group 2 = blue, group 3 = green. But with variables having only the second and last group: group 2 = red and group 3 = blue.)

I tried to solve this problem myself by adding a scale_colour_manual in the following way:

scale_colour_manual(values = c("group1"="#F8766D", "group2"="#00BA38", "group3"="#619CFF"))

This seems to work for the density plots on the diagonal (ggally_densityDiag) and for the scatter plots in the lower part (ggally_points), but for the correlations (ggally_cor) I only get the overal (black) correlations and none of the coloured group correlations anymore. While they were displayed before, but with wrong matching of colours and groups. Why are they not displayed anymore?

Following code generates this plot, the colours and groups are not matching.

ggpairs(output.b[,c(13,17,18)], aes(colour = as.factor(output.b$country), alpha = 0.4),
upper = list(continuous = function(data, mapping, ...) {
  ggally_cor(data = output.b, mapping = mapping) + scale_colour_manual(values = c("#F8766D", "#00BA38", "#619CFF"))}),
lower = list(continuous = function(data, mapping, ...) {
  ggally_points(data = output.b, mapping = mapping) + scale_colour_manual(values = c("#F8766D", "#00BA38", "#619CFF"))}),
diag = list(continuous = function(data, mapping, ...) {
  ggally_densityDiag(data = output.b, mapping = mapping) + scale_fill_manual(values = c("#F8766D", "#00BA38", "#619CFF"))}))

The adapted code generated this plot, the coloured group correlations are not displayed anymore.

ggpairs(output.b[,c(13,17,18)], aes(colour = as.factor(output.b$country), alpha = 0.4),
upper = list(continuous = function(data, mapping, ...) {
  ggally_cor(data = output.b, mapping = mapping) + scale_colour_manual(values = c("group1"="#F8766D", "group2"="#00BA38", "group3"="#619CFF"))}),
lower = list(continuous = function(data, mapping, ...) {
  ggally_points(data = output.b, mapping = mapping) + scale_colour_manual(values = c("group1"="#F8766D", "group2"="#00BA38", "group3"="#619CFF"))}),
diag = list(continuous = function(data, mapping, ...) {
  ggally_densityDiag(data = output.b, mapping = mapping) + scale_fill_manual(values = c("group1"="#F8766D", "group2"="#00BA38", "group3"="#619CFF"))}))
bdemarest
  • 14,397
  • 3
  • 53
  • 56
lvdb
  • 41
  • 4

2 Answers2

1

[UPDATE] After a lot of searching and trying I discovered the problem, but did not manage to solve it. To change the color of 'group3: ' in the upper correlations to blue, I have to isolate these plots and do the scale_colour_manual like in the following code:

p <- ggpairs(...)
p[1,2] <- p[1,2] + scale_colour_manual("group3: 0.113" = "#F8766D")
p[1,3] <- p[1,3] + scale_colour_manual("group3: 0.268" = "#F8766D")

It is far too cumbersome to do all this manually since I have to make several of these plots with different groupings and I have many more variables... Is there any way to implement this automatically in ggally_cor?

lvdb
  • 41
  • 4
1

I had the same issue. I just re-wrote a better version of the ggally_cor function from scratch. The only thing you need to do is specify "Overall Corr"="black" in scale_color_manual

library(dplyr)
library(ggplot2)
library(GGally)

# set dplyr functions
select <- dplyr::select; rename <- dplyr::rename; mutate <- dplyr::mutate; 
summarize <- dplyr::summarize; arrange <- dplyr::arrange; slice <- dplyr::slice; filter <- dplyr::filter; recode<-dplyr::recode

# remove obs for setosa
data = iris %>% mutate(Sepal.Length = ifelse(Species=="setosa",NA,Sepal.Length))

mycorrelations <- function(data,mapping,...){
    data2 = data
    data2$x = as.numeric(data[,as_label(mapping$x)])
    data2$y = as.numeric(data[,as_label(mapping$y)])
    data2$group = data[,as_label(mapping$colour)]
    
    correlation_df = data2 %>% 
        bind_rows(data2 %>% mutate(group="Overall Corr")) %>%
        group_by(group) %>% 
        filter(sum(!is.na(x),na.rm=T)>1) %>%
        filter(sum(!is.na(y),na.rm=T)>1) %>%
        summarize(estimate = round(as.numeric(cor.test(x,y,method="spearman")$estimate),2),
                  pvalue = cor.test(x,y,method="spearman")$p.value,
                  pvalue_star = as.character(symnum(pvalue, corr = FALSE, na = FALSE, 
                                                    cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), 
                                                    symbols = c("***", "**", "*", "'", " "))))%>%
        group_by() %>%
        mutate(group = factor(group, levels=c(as.character(unique(sort(data[,as_label(mapping$colour)]))), "Overall Corr")))
    
    ggplot(data=correlation_df, aes(x=1,y=group,color=group))+
        geom_text(aes(label=paste0(group,": ",estimate,pvalue_star)))
}


ggpairs(data,columns=1:4,
        mapping = ggplot2::aes(color=Species), 
        upper = list(continuous = mycorrelations))+
    scale_color_manual(values=c("setosa"="orange","versicolor"="purple","virginica"="brown","Overall Corr"="black"))

enter image description here

Isaac Zhao
  • 379
  • 2
  • 10