How to write the Pearson correlation coefficient in the lower panel of a scatterplot matrix when data has 2 levels?

Question

I would like to generate a matrix of scatterplots from the following data frame.

# Generate some fake data
set.seed(123)
fakeData <- rnorm(10)
df <- data.frame(Type=c(rep("A", 5), rep("B", 5)), 
                 Syst=fakeData, Bio=2*fakeData, Blr=fakeData^2)

If I use the pairs function, I get scatterplots both below and above the diagonal of my scatterplot matrix.

I do want to keep the scatterplots in the upper panel, however, I would like to "plot" the correlation coefficient of my data in the lower panel.

I have looked for an answer online, and despite finding some good explanation, I have had no success so far. Like here,and here, here too, and here as well. While elucidating, these examples don't go over cases when there are data with different levels in the data frame.

As my data indicate, there are two levels in my data frame, "A" and "B". Hence, I'd like to have two correlation coefficient in each "box" of my lower panel, one for the data whose level is A and another for the data whose level is B. For instance, in plotting pairs(df[2:4]), I'd like to see these two coefficients in the first box of the second line (lower panel) of my matrix.

This line of code

pairs(df[2:4], main="", pch=21, bg=c("red","blue"), lower.panel=NULL)

will plot the scatterplot matrix on the upper panel. By assign color options to bg, I can differentiate between A and B data points. Ideally, my Pearson correlation coefficient will be plotted in the same as their respective data were.

Attempt # 1 - I took the commented function below and changed a bit so as to accommodate the changes needed for the desired result.

# panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
# {
#   usr <- par("usr"); on.exit(par(usr))
#   par(usr = c(0, 1, 0, 1))
#   r <- abs(cor(x, y))
#   txt <- format(c(r, 0.123456789), digits=digits)[1]
#   txt <- paste(prefix, txt, sep="")
#   if(missing(cex.cor)) cex.cor <- 2
#   text(0.5, 0.5, txt, cex = cex.cor)
# }

I know my data frame "df" has 10 rows. Suppose I want to print the correlation of only the data whose level is A in the lower panel. I thought of changing x and y dimensions to restrict both variables to take only level-A data.

panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)
{
  x <- x[1:5,1:3]
  y <- y[1:5,1:3]
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r <- abs(cor(x, y))
  txt <- format(c(r, 0.123456789), digits=digits)[1]
  txt <- paste(prefix, txt, sep="")
  if(missing(cex.cor)) cex.cor <- 2
  text(0.5, 0.5, txt, cex = cex.cor)
}

Unfortunately, this didn't work either. I get an error message that says incorrect number of dimensions

score 0 · Answer 1 · answered Jun 25 '19 at 13:10

The function ggscatmat from the GGally library will do the trick.

For example, for the generated data, a satisfactory scatterplot matrix will be plotted with

ggscatmat(df, columns = 2:4, color = "Type", alpha = 0.25)

Further ggplot specifications, as scale_color_... and theme, will work as well. Of course, as with any package function, one may need to tweak with it a bit in order to get the desired result. However, this function is an excellent start.

How to write the Pearson correlation coefficient in the lower panel of a scatterplot matrix when data has 2 levels?

1 Answers1