1

I have the following R-code, where I wish for the beta_i in the legends to be actual greek-letter-betas. Please ignore the dansih comments. The code is supposed to show the solution path of a ridge regression. The actual code is much longer, with several plots with the same issue.

library(latex2exp)
library(glmnet)
library(MASS)
library(ggplot2)
library(reshape)
library(gridExtra)

set.seed(10)
Y = rnorm(100)
Y = scale(Y)
X=matrix(rnorm(100*8),ncol=8)
X = scale(X)

fitR = glmnet(X,Y, alpha = 0)
beta = coef(fitR)

temp = as.data.frame(as.matrix(beta)) #Laver til dataframe
temp$coef = row.names(temp) #Danner ny kolonne med koefficientnavne
temp = temp[temp$coef != "(Intercept)",] #Fjerner interceptet, der er 0, da normaliseret.
temp = reshape::melt(temp, id = "coef") #Slår de 100 tabeller sammen
temp$variable = as.numeric(gsub("s", "", temp$variable)) #Omdøber variabelnavne
temp$lambda = fitR$lambda[temp$variable+1] #Henter lambdaer
temp$coef = paste("beta_", gsub("V", "", temp$coef), sep="")


plot1 = ggplot(temp, aes(lambda, value, color = coef)) + 
    xlim(0,75) +
    geom_line() + 
    ggtitle(TeX("Ridge estimater mod $\\lambda$"))+
        xlab(TeX("$\\lambda$")) + ylab("Estimat")+
guides(color = guide_legend(title = "")) +
        theme_bw() + 
        theme(legend.key.width = unit(3,"lines"))

grid.arrange(plot1)

The important vector, temp$coef is a vector consisting of 500 values of beta_i for i=1,...8. I have tried without luck to write:

ggplot(temp, aes(lambda, value, color = paste('TeX("$\\', coef, '$")', sep=''))

but this results in an error: "Fejl: Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?".

Inspired by this and this post, I replaced the line

guides(color = guide_legend(title = "")) +

with

scale_color_discrete(labels = parse(text= paste("beta[", 1:8, "]", sep=""))) +

which does fix my problem. However I have two problems with this. First of all I end up using non-LaTeX notation "beta[i]" instead of the latex-style "beta_i", when I have used LaTeX in the rest of the code. Second of all this only works because in my case all entrances in temp$coef consists of "beta_i". If these 8 entrances were e.g.

temp$coef = c("alpha_1", "beta_2", ..., "theta_8")

then I would not be able to do the same.

So my question is this: Given a vector of expressions suitable for latex (e.g. c(alpha_1, ..., theta_8)), is there a way to build a legend in a ggplot using the names of this vector?

As this is my first post here, please let me know, if I need to change anything.


Edit based on the comments by user2554330 I have tried using: scale_color_discrete(labels = TeX(temp$coef)) + which doesn't give any errors, but it doesn't show any names in the legend.

Using $...$ around the temp$coef gives the error: Fejl: uventet '$' in:" xlab(TeX("$\\lambda$")) + ylab("Estimat")+ scale_color_discrete(labels = TeX($" Writing scale_color_discrete(labels = TeX(\\temp$coef)) + gives a similar error.

I've also tried using: scale_color_discrete(labels = TeX(paste('$\\', unique(temp$coef), '$', sep=''))) +

but this just writes the non-greek beta_1, ..., beta_8 in the legend.

Finnally writing:

scale_color_discrete(labels = TeX(unique(temp$coef)))

achieves half the goal. In the legend it writes beta_i, where i is actually a subscript.

Qwethm
  • 374
  • 1
  • 9

1 Answers1

1

This is a little tricky. The idea is that you can use a function for labels in scale_color_discrete(), and you want that function to convert things like beta_1 into an R expression to use as the label. This seems to work:

 toLabel <- function(x) 
   TeX(paste0("$\\", x, "$"))

Then use this in scale_color_discrete:

ggplot(temp, aes(lambda, value, color = coef)) + 
  xlim(0,75) +
  geom_line() + 
  ggtitle(TeX("Ridge estimater mod $\\lambda$"))+
  xlab(TeX("$\\lambda$")) + ylab("Estimat")+
  theme_bw() + 
  theme(legend.key.width = unit(3,"lines")) +
  scale_color_discrete(labels = toLabel)

This gives me this legend:

screenshot

user2554330
  • 37,248
  • 4
  • 43
  • 90
  • Thank you, this works perfectly! Is it correctly understood, that when I write: "scale_color_discrete(labels = 'some function')", then the function is automatically used on the distinct values of the vector previously put equal to "col" in the "aes"-function? – Qwethm May 17 '20 at 12:18
  • I think that's right. The help page `?scale_color_discrete` isn't very helpful though! – user2554330 May 17 '20 at 12:29