1

How can I include a legend inside one of the empty panels of the following matrix plot?

I have color coded different regression lines in the plots. I need a legend based on color.

I believe this answer comes closest to answer my question, yet I do not know how exactly to modify my code to get a legend based on color for different regression lines.

As for the background of the code, I am trying to study different robust and non-robust regression methods applied to multivariate data with and without outliers.

library(ggplot2)
library(GGally)
library(MASS)
library(robustbase)

## Just create data -- you can safely SKIP this function.
## 
## Take in number of input variables (k), vector of ranges of k inputs
## ranges = c(min1, max1, min2, max2, ...) (must have 2k elements),
## parameters to create data (must be consistent with the number of
## input variables plus one), parameters are vector of linear
## coefficients (b) and random seed (seed), number of observations
## (n), vector of outliers (outliers)
## 
## Return uncontaminated dataframe and contaminated dataframe
create_data <- function(k, ranges, b, seed = 6, n,
                        outliers = NULL) {
    x <- NULL                           # x: matrix of input variables
    for (i in 1:k) {
    set.seed(seed^i)
        ## x <- cbind(x, runif(n, ranges[2*i-1], ranges[2*i]))
        x <- cbind(x, rnorm(n, ranges[2*i-1], ranges[2*i]))
    }
    set.seed(seed - 2)
    x_aug = cbind(rep(1, n), x)
    y <- x_aug %*% b
    y_mean = mean(y)
    e <- rnorm(n, 0, 0.20 * y_mean)     # rnorm x
    y <- y + e
    df <- data.frame(x = x, y = y)
    len <- length(outliers)
    n_rows <- len %/% (k+1)
    if (!is.null(outliers)) {
        outliers <- matrix(outliers, n_rows, k+1, byrow = TRUE)
        df_contamin <- data.frame(x = rbind(x, outliers[,1:k]), y = c(y, outliers[,k+1]))
    } else {
        df_contamin <- df
    }
    dat <- list(df, df_contamin)
}

# plot different regression models (some are robust) for two types of
# data (one is contaminated with outliers)
plot_models <- function(data, mapping, data2) {
    cb_palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
    ## 1.grey, 2.light orange, 3.light blue, 4.green, 5.yellow, 6.blue, 7.red, 8.purple
    plt <- ggplot(data = data, mapping = mapping) +
        geom_point() +
        theme_bw() +
        geom_smooth(method = lm, formula = y ~ x, data = data2, color = cb_palette[3], se = FALSE) +
        geom_smooth(method = lm, formula = y ~ x, color = cb_palette[7], se = FALSE) +
        geom_smooth(method = rlm, formula = y ~ x, color = cb_palette[4], se = FALSE)  +
        geom_smooth(method = lmrob, formula = y ~ x, color = cb_palette[1], se = FALSE)
    plt
}

# trim the upper and right panels of plots
trim_gg <- function(gg) {
    n <- gg$nrow
    gg$nrow <- gg$ncol <- n-1
    v <- 1:n^2
    gg$plots <- gg$plots[v > n & v%%n != 0]
    gg$xAxisLabels <- gg$xAxisLabels[-n]
    gg$yAxisLabels <- gg$yAxisLabels[-1]
    gg
}

dat <- create_data(3, c(1, 10, 1, 10, 1, 10), c(5, 8, 6, 7), 6, 20, c(30, 30, 50, 400))
df <- dat[[1]]
df_contamin <- dat[[2]]

## Note that plot_models is called here
g <- ggpairs(df_contamin, columns = 1:4, lower = list(continuous = wrap(plot_models, data2 = df)), diag = list(continuous = "blankDiag"), upper = list(continuous = "blank")) #, legend = lgd)

gr <- trim_gg(g)
print(gr)

Created on 2019-10-09 by the reprex package (v0.3.0)

Sorry for the long code, but most probably only the plot_models function and the line where ggpairs is called need to be modified.

I want to get a legend in the blank upper half of the plots. It may be done by somehow tweaking the plot_models function, setting the mapping in ggpairs to color using ggplot2::aes_string, and using getPlot and putPlot of the GGally package. But I can't wrap my head around how to do it exactly.

Pooya
  • 145
  • 10
  • 1
    You are setting `upper` to be blank, i.e. `upper = list(continuous = "blank")`. You should pass your annotation there. See the `Details` section in the `ggpairs` function. – csgroen Oct 09 '19 at 16:21
  • @csgroen Do you mean that I need to write something like `my_legend` function and then `wrap` it and pass it to `upper` like this: `upper = list(continuous = "wrap(my_legend)"`? – Pooya Oct 09 '19 at 17:25
  • 1
    That probably wouldn't work that simply. Re-reading your code, I realized that you set the colors manually, and with each line as a call to a new `geom`. To use an automatic legend wouldn't be that simple. Generally, if this were just `ggplot`, I'd advise to manipulate the `grob` directly and add your legend there. You can also re-write your code to predict your models before plotting first and add the predicted values as variables to your plotting df inside of `plot_models`. Then melt and use them with `geom_line`, which would provide you with an automatic legend. – csgroen Oct 09 '19 at 20:01
  • @csgroen I think the idea in your last comment will work. The code will probably be longer, but it has the benefit of giving out the regression parameters as well, not just plotting the regression line. – Pooya Oct 09 '19 at 20:59

0 Answers0