0

I'm trying to make multiple boxplots with ggplot2 side by side. I've been following the stes Multiple boxplots placed side by side for different column values in ggplot but without much luck.

I have the following dataframes

Raw <- sp500_logreturns
Normal <- rnorm(1000, 0, sd(sp500_logreturns)
Student <- cbind(c(rt(1000, df = 2)),c(rt(1000, df = 3)))

And I want to make the following Boxplot sketch

My Raw vector contains logreturns-transformation of my prices downloaded as an environment from yahoo into R. I must admit I'm quite lost, and do not know if I'm on an impossible mission. I hope I've described my problem well enough together with my sketch. Thank you in advance.

Update 1: The goal is to compare the raw data distribution (which is leptokurtic) and therefore a student disitribution with 2 or 3 degree of freedom might be more suitable than a normal distribution. To give you an idea of the data I'm looking at, here's a summary:

      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.0418425 -0.0023740  0.0005898  0.0004704  0.0045065  0.0484032  

Here is my boxplot made from Edward's code: Boxplot (Edward)

Update 2: I figured it out. I used fitdist from rugarch to find out the best student distribution fitted to the raw data. This way I could ignore trying to match different dfs of the student distribution. This is what I will go on with:

fitdist(distribution = 'std', sp500_logreturns)$pars
          mu        sigma        shape 
0.0008121004 0.0113748869 2.3848231857 

data <- data.frame(
        Raw = as.numeric(sp500_logreturns),
        Normal = rnorm(1006, 0, sd(sp500_logreturns)),
        Student = rdist(distribution = 'std', n = 1006, mu = 0.0008121004, sigma = 0.0113748869, shape = 2.3848231857)
)

data2 <- pivot_longer(data, cols=everything()) %>%
        mutate(name=factor(name, levels=c("Raw","Normal","Student")))

data3 <- data2 %>% summarise(min=min(value), max=max(value))

pbox1 <- (filter(data2, name %in% c("Raw","Normal","Student")) %>%
        ggplot(aes(y=value, fill=name)) +
        geom_boxplot() +
        facet_grid(~name) +
        ylab("Log-returns") +
        ylim(data3$min, data3$max) +
        theme(legend.position = "none",
              axis.ticks.x=element_blank(),
              panel.grid.major.x = element_blank(),
              panel.grid.minor.x = element_blank(),
              axis.text.x=element_text(color="white"))+
        ggtitle("Boxplot comparison")+
        theme(plot.title = element_text(hjust = 0.5)))

And this gives me: Boxplot (final)

mas2
  • 75
  • 11

1 Answers1

2

In base R:

set.seed(11)
data <- data.frame(
  Raw = rnorm(1000),
  Normal = rnorm(1000),
  Student = cbind(c(rt(1000, df = 2)),c(rt(1000, df = 3)))
)

ylim=c(min(data), max(data))

layout(matrix(1:3, nc=3), widths=c(5,4,5))
par(las=1, mar=c(2,4,5,0))
boxplot(daat$Raw, col="steelblue", ylab="Log-returns", ylim=ylim)
title(main="Raw", line=1)

par(mar=c(2,1,5,0))
boxplot(data$Normal, yaxt="n", col="tomato", ylim=ylim)
title(main="Normal", line=1)

par(mar=c(2,1,5,1))
boxplot(data[,3:4], yaxt="n", col=c("green1","green3"), names=c("df = 2","df = 3"), ylim=ylim)
title(main="Student", line=1)
title(main="Boxplot comparison", outer=TRUE, line=-1.5, cex.main=1.5)

enter image description here


In ggplot2, more work is invovled:

set.seed(11)
data <- data.frame(
  Raw = rnorm(1000),
  Normal = rnorm(1000),
  Student = cbind(c(rt(1000, df = 2)),c(rt(1000, df = 3)))
)

library(dplyr)
library(tidyr)
library(ggplot2)

data2 <- pivot_longer(data, cols=everything()) %>%
  mutate(name=factor(name, levels=c("Raw","Normal","Student.1","Student.2")))

data3 <- data2 %>% summarise(min=min(value), max=max(value))

p1 <- filter(data2, name %in% c("Raw","Normal")) %>%
  ggplot(aes(y=value, fill=name)) +
  geom_boxplot() +
  facet_grid(~name) +
  ylab("Log-returns") +
  ylim(data3$min, data3$max) +
  theme_bw() +
  theme(legend.position = "none",
        axis.ticks.x=element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text.x=element_text(color="white"))

p2 <- filter(data2, grepl("Student", name)) %>%
  mutate(what="Student") %>%
  ggplot(aes(x=name, y=value, fill=name)) +
  geom_boxplot() +
  scale_fill_manual(values=c("green1","green3")) +
  scale_x_discrete(labels=c("df=2", "df=3")) +
  facet_grid(~what) +
  ylim(data3$min, data3$max) +
  theme_bw() +
  theme(legend.position = "none",
        axis.title.y = element_blank(),
        axis.title.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

library(ggpubr)
ggarrange(p1, p2)

enter image description here

Edward
  • 10,360
  • 2
  • 11
  • 26
  • You are my hero! But do you know how to convert this into ggplot2, I need to stay consistent in my plots to satisfy my "OCD" :-) – mas2 May 13 '20 at 10:09
  • What is "OCD"? And could you show me what your desired graph actually looks like? I'd hate to spend an hour on a graph and then find out it wasn't the way you wanted it. :) *** oh, I figured it out. haha – Edward May 13 '20 at 11:53
  • I use the standard ggplot-look. I don't know if it makes sense, but just want to transform the graph, that you've made into ggplot-look. – mas2 May 13 '20 at 12:18
  • It's perfect! Thank you Edward, this really helped me moving on with my thesis! Have a nice day :-) – mas2 May 13 '20 at 12:37
  • Thanks. I'm trying to load ```ggpubr``` but it wont load. It says "package or namespace load failed for ‘ggpubr’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): namespace ‘tibble’ 2.1.3 is loaded, but >= 3.0.0 is required". But my R wont update ```tibble``` package. Do you know any fix? – mas2 May 13 '20 at 13:26
  • I've seen that here before. I think the solution is to restart R (or maybe even your computer). Open R, then install ggpubr. There are other packages that can arrange ggplots. Try cowplot or patchwork – Edward May 13 '20 at 13:34
  • One more thing... My log-returns have quite small values. When I plot my data I get totally flat boxplots for my normal and raw data. How to fix this? – mas2 May 13 '20 at 15:21
  • What's the objective of the graphs? Maybe you can remove the `ylim` command and let the axes fit the data. You'll have to add the y-axis to the Student plot, `p2` (omit this line: `axis.ticks.y=element_blank()`). Tell me exactly what you want. – Edward May 13 '20 at 15:58
  • I've updated once more. This is my final result. Thanks for the help! – mas2 May 13 '20 at 18:48